URL
This is the manual for the url
Emacs Lisp library.
Copyright © 1993–1999, 2002, 2004–2024 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being “A GNU Manual,” and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License”.
(a) The FSF’s Back-Cover Text is: “You have the freedom to copy and modify this GNU manual.”
Table of Contents
Next: URI Parsing, Previous: URL, Up: URL [Contents][Index]
1 Introduction
A Uniform Resource Identifier (URI) is a specially-formatted name, such as an Internet address, that identifies some name or resource. The format of URIs is described in RFC 3986, which updates and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A Uniform Resource Locator (URL) is an older but still-common term, which basically refers to a URI corresponding to a resource that can be accessed (usually over a network) in a specific way.
Here are some examples of URIs (taken from RFC 3986):
ftp://ftp.is.co.za/rfc/rfc1808.txt https://www.ietf.org/rfc/rfc2396.txt ldap://[2001:db8::7]/c=GB?objectClass?one mailto:John.Doe@example.com news:comp.infosystems.www.servers.unix tel:+1-816-555-1212 telnet://192.0.2.16:80/ urn:oasis:names:specification:docbook:dtd:xml:4.1.2
This manual describes the url
library, an Emacs Lisp library
for parsing URIs and retrieving the resources to which they refer.
(The library is so-named for historical reasons; nowadays, the “URI”
terminology is regarded as the more general one, and “URL” is
technically obsolete despite its widespread vernacular usage.)
Next: Retrieving URLs, Previous: Introduction, Up: URL [Contents][Index]
2 URI Parsing
A URI consists of several components, each having a different meaning. For example, the URI
https://www.gnu.org/software/emacs/
specifies the scheme component ‘https’, the hostname component ‘www.gnu.org’, and the path component ‘/software/emacs/’.
The format of URIs is specified by RFC 3986. The url
library
provides the Lisp function url-generic-parse-url
, a (mostly)
standard-compliant URI parser, as well as function
url-recreate-url
, which converts a parsed URI back into a URI
string.
- Function: url-generic-parse-url uri-string ¶
This function returns a parsed version of the string uri-string.
- Function: url-recreate-url uri-obj ¶
-
Given a parsed URI, this function returns the corresponding URI string.
The return value of url-generic-parse-url
, and the argument
expected by url-recreate-url
, is a parsed URI: a CL
structure whose slots hold the various components of the URI.
See the CL Manual in GNU Emacs Common Lisp Emulation, for
details about CL structures. Most of the other functions in the
url
library act on parsed URIs.
Next: URI Encoding, Up: URI Parsing [Contents][Index]
2.1 Parsed URI structures
Each parsed URI structure contains the following slots:
type
The URI scheme (a string, e.g.,
http
). See Supported URL Types, for a list of schemes that theurl
library knows how to process. This slot can also benil
, if the URI is not fully specified.user
The user name (a string), or
nil
.password
The user password (a string), or
nil
. The use of this URI component is strongly discouraged; nowadays, passwords are transmitted by other means, not as part of a URI.host
The host name (a string), or
nil
. If present, this is typically a domain name or IP address.port
The port number (an integer), or
nil
. Omitting this component usually means to use the “standard” port associated with the URI scheme.filename
The combination of the “path” and “query” components of the URI (a string), or
nil
. If the query component is present, it is the substring following the first ‘?’ character, and the path component is the substring before the ‘?’. The meaning of these components is scheme-dependent; they do not necessarily refer to a file on a disk.target
The fragment component (a string), or
nil
. The fragment component specifies a “secondary resource”, such as a section of a webpage.fullness
This is
t
if the URI is fully specified, i.e., the hierarchical components of the URI (the hostname and/or username and/or password) are preceded by ‘//’.
These slots have accessors named url-part
, where
part is the slot name. For example, the accessor for the
host
slot is the function url-host
. The url-port
accessor returns the default port for the URI scheme if the parsed
URI’s port slot is nil
.
The slots can be set using setf
. For example:
(setf (url-port url) 80)
Previous: Parsed URI structures, Up: URI Parsing [Contents][Index]
2.2 URI Encoding
The url-generic-parse-url
parser does not obey RFC 3986 in
one respect: it allows non-ASCII characters in URI strings.
Strictly speaking, RFC 3986 compatible URIs may only consist of ASCII characters; non-ASCII characters are represented by converting them to UTF-8 byte sequences, and performing percent encoding on the bytes. For example, the o-umlaut character is converted to the UTF-8 byte sequence ‘\xD3\xA7’, then percent encoded to ‘%D3%A7’. (Certain “reserved” ASCII characters must also be percent encoded when they appear in URI components.)
The function url-encode-url
can be used to convert a URI
string containing arbitrary characters to one that is properly
percent-encoded in accordance with RFC 3986.
- Function: url-encode-url url-string ¶
This function return a properly URI-encoded version of url-string. It also performs URI normalization, e.g., converting the scheme component to lowercase if it was previously uppercase.
To convert between a string containing arbitrary characters and a
percent-encoded all-ASCII string, use the functions
url-hexify-string
and url-unhex-string
:
- Function: url-hexify-string string &optional allowed-chars ¶
This function performs percent-encoding on string, and returns the result.
If string is multibyte, it is first converted to a UTF-8 byte string. Each byte corresponding to an allowed character is left as-is, while all other bytes are converted to a three-character sequence: ‘%’ followed by two upper-case hex digits.
The allowed characters are specified by allowed-chars. If this argument is
nil
, the allowed characters are those specified as unreserved characters by RFC 3986 (see the variableurl-unreserved-chars
). Otherwise, allowed-chars should be either a list of allowed chars, or a vector whose Nth element is non-nil
if character N is allowed.
- Function: url-unhex-string string &optional allow-newlines ¶
This function replaces percent-encoding sequences in string with their character equivalents, and returns the resulting string.
If allow-newlines is non-
nil
, it allows the decoding of carriage returns and line feeds, which are normally forbidden in URIs.
Next: Supported URL Types, Previous: URI Parsing, Up: URL [Contents][Index]
3 Retrieving URLs
The url
library defines the following three functions for
retrieving the data specified by a URL. The actual retrieval protocol
depends on the URL’s URI scheme, and is performed by lower-level
scheme-specific functions. (Those lower-level functions are not
documented here, and generally should not be called directly.)
In each of these functions, the url argument can be either a
string or a parsed URL structure. If it is a string, that string is
passed through url-encode-url
before using it, to ensure that
it is properly URI-encoded (see URI Encoding).
- Function: url-retrieve-synchronously url &optional silent no-cookies timeout ¶
This function synchronously retrieves the data specified by url, and returns a buffer containing the data. The return value is
nil
if there is no data associated with the URL (as is the case fordired
,info
, andmailto
URLs).If the optional argument silent is non-
nil
, progress messages are suppressed. If the optional argument no-cookies is non-nil
, cookies are not stored or sent. If the optional argument timeout is non-nil
, it should be a number that says (in seconds) how long to wait for a response before giving up.
- Function: url-retrieve url callback &optional cbargs silent no-cookies ¶
This function retrieves url asynchronously, calling the function callback when the object has been completely retrieved. The return value is the buffer into which the data will be inserted, or
nil
if the process has already completed.The callback function is called this way:
(apply callback status cbargs)
where status is a plist representing what happened during the retrieval, with most recent events first, or an empty list if no events have occurred. Each pair in the plist is one of:
(:redirect redirected-to)
This means that the request was redirected to the URL redirected-to.
(:error (error-symbol . data))
This means that an error occurred. If so desired, the error can be signaled with
(signal error-symbol data)
.
When the callback function is called, the current buffer is the one containing the retrieved data (if any). The buffer also contains any MIME headers associated with the data retrieval.
If the optional argument silent is non-
nil
, progress messages are suppressed. If the optional argument no-cookies is non-nil
, cookies are not stored or sent.
- Function: url-queue-retrieve url callback &optional cbargs silent no-cookies ¶
This function acts like
url-retrieve
, but with limits on the number of concurrently-running network processes. The optionurl-queue-parallel-processes
controls the number of concurrent processes, and the optionurl-queue-timeout
sets a timeout in seconds.To use this function, you must
(require 'url-queue)
.
- User Option: url-queue-parallel-processes ¶
The value of this option is an integer specifying the maximum number of concurrent
url-queue-retrieve
network processes. If the number ofurl-queue-retrieve
calls is larger than this number, later ones are queued until earlier ones are finished.
- User Option: url-queue-timeout ¶
The value of this option is a number specifying the maximum lifetime of a
url-queue-retrieve
network process, once it is started. If a process is not finished by then, it is killed and removed from the queue.
Next: General Facilities, Previous: Retrieving URLs, Up: URL [Contents][Index]
4 Supported URL Types
This chapter describes functions and variables affecting URL retrieval for specific schemes.
http
andhttps
- file and ftp
- info
- mailto
news
,nntp
andsnews
- telnet and tn3270
- irc
- data
- nfs
- ldap
- man
- URL Types Supported via Tramp
Next: file and ftp, Up: Supported URL Types [Contents][Index]
4.1 http
and https
The http
scheme refers to the Hypertext Transfer Protocol. The
url
library supports HTTP version 1.1, specified in RFC 2616.
Its default port is 80.
The https
scheme is a secure version of http
, with
transmission via SSL. It is defined in RFC 2069, and its default port
is 443. When using https
, the url
library performs SSL
encryption via the ssl
library, by forcing the ssl
gateway method to be used. See Gateways in General.
- User Option: url-honor-refresh-requests ¶
If this option is non-
nil
(the default), theurl
library honors the HTTP ‘Refresh’ header, which is used by servers to direct clients to reload documents from the same URL or a different one. If the value isnil
, the ‘Refresh’ header is ignored; any other value means to ask the user on each request.
Next: Language and Encoding Preferences, Up: http
and https
[Contents][Index]
4.1.1 Cookies
This command creates a *url cookies* buffer listing the current cookies, if there are any. You can remove a cookie using the C-k (
url-cookie-delete
) command.
This function takes a regular expression as its parameters and deletes all cookies from that domain. If regexp is
nil
, delete all cookies.
The file in which cookies are stored, defaulting to cookies in the directory specified by
url-configuration-directory
.
Specifies whether confirmation is required to accept cookies.
Specifies whether to put all cookies for the server on one line in the HTTP request to satisfy broken servers.
A list of regular expressions matching URLs from which to accept cookies always.
A list of regular expressions matching URLs from which to reject cookies always.
The number of seconds between automatic saves of cookies to disk. Default is one hour.
Next: HTTP URL Options, Previous: Cookies, Up: http
and https
[Contents][Index]
4.1.2 Language and Encoding Preferences
HTTP allows clients to express preferences for the language and encoding of documents which servers may honor. For each of these variables, the value is a string; it can specify a single choice, or it can be a comma-separated list.
Normally, this list is ordered by descending preference. However, each
element can be followed by ‘;q=priority’ to specify its
preference level, a decimal number from 0 to 1; e.g., for
url-mime-language-string
, "de, en-gb;q=0.8, en;q=0.7"
. An element that has no ‘;q’ specification has
preference level 1.
- User Option: url-mime-charset-string ¶
-
This variable specifies a preference for character sets when documents can be served in more than one encoding.
HTTP allows specifying a series of MIME charsets which indicate your preferred character set encodings, e.g., Latin-9 or Big5, and these can be weighted. The default series is generated automatically from the associated MIME types of all defined coding systems, sorted by the coding system priority specified in Emacs. See Recognizing Coding Systems in The GNU Emacs Manual.
- User Option: url-mime-language-string ¶
-
A string specifying the preferred language when servers can serve files in several languages. Use RFC 1766 abbreviations, e.g., ‘en’ for English, ‘de’ for German.
The string can be
"*"
to get the first available language (as opposed to the default).
Next: Dealing with HTTP documents, Previous: Language and Encoding Preferences, Up: http
and https
[Contents][Index]
4.1.3 HTTP URL Options
HTTP supports an ‘OPTIONS’ method describing things supported by the URL.
- Function: url-http-options url ¶
Returns a property list describing options available for URL. The property list members are:
methods
A list of symbols specifying what HTTP methods the resource supports.
dav
¶A list of numbers specifying what DAV protocol/schema versions are supported.
dasl
¶A list of supported DASL search types supported (string form).
ranges
A list of the units available for use in partial document fetches.
p3p
¶The Platform For Privacy Protection description for the resource. Currently this is just the raw header contents.
Previous: HTTP URL Options, Up: http
and https
[Contents][Index]
4.1.4 Dealing with HTTP documents
HTTP URLs are retrieved into a buffer containing the HTTP headers followed by the body. Since the headers are quasi-MIME, they may be processed using the MIME library. See Emacs MIME in The Emacs MIME Manual.
Next: info, Previous: http
and https
, Up: Supported URL Types [Contents][Index]
4.2 file and ftp
The ftp
and file
schemes are defined in RFC 1808. The
url
library treats ‘ftp:’ and ‘file:’ as synonymous.
Such URLs have the form
ftp://user:password@host:port/file file://user:password@host:port/file
If the URL specifies a local file, it is retrieved by reading the file contents in the usual way. If it specifies a remote file, it is retrieved using either the Tramp or the Ange-FTP package. See Remote Files in The GNU Emacs Manual.
When retrieving a compressed file, it is automatically uncompressed
if it has the file suffix .z, .gz, .Z,
.bz2, or .xz. (The list of supported suffixes is
hard-coded, and cannot be altered by customizing
jka-compr-compression-info-list
.)
Next: mailto, Previous: file and ftp, Up: Supported URL Types [Contents][Index]
4.3 info
The info
scheme is non-standard. Such URLs have the form
info:file#node
and are retrieved by invoking Info-goto-node
with argument
‘(file)node’. If ‘#node’ is omitted, the
‘Top’ node is opened.
Next: news
, nntp
and snews
, Previous: info, Up: Supported URL Types [Contents][Index]
4.4 mailto
A mailto
URL specifies an email message to be sent to a given
email address. For example, ‘mailto:foo@bar.com’ specifies
sending a message to ‘foo@bar.com’. The “retrieval method”
for such URLs is to open a mail composition buffer in which the
appropriate content (e.g., the recipient address) has been filled in.
As defined in RFC 6068, a mailto
URL can have the form
‘mailto:mailbox[?header=contents[&header=contents]]’
where an arbitrary number of headers can be added. If the
header is ‘body’, then contents is put in the message
body; otherwise, a header header field is created with
contents as its contents. Note that the url
library does
not perform any checking of header or contents, so you
should check them before sending the message.
- User Option: url-mail-command ¶
-
The value of this variable is the function called whenever url needs to send mail. This should normally be left its default, which is the standard mail-composition command
compose-mail
. See Sending Mail in The GNU Emacs Manual.
If the document containing the mailto
URL itself possessed a
known URL, Emacs automatically inserts an ‘X-Url-From’ header
field into the mail buffer, specifying that URL.
Next: telnet and tn3270, Previous: mailto, Up: Supported URL Types [Contents][Index]
4.5 news
, nntp
and snews
The news
, nntp
, and snews
schemes, defined in RFC
1738, are used for reading Usenet newsgroups. For compatibility with
non-standard-compliant news clients, the url
library allows
host and port fields to be included in news
URLs, even though
they are properly only allowed for nntp
and snews
.
news
and nntp
URLs have the following form:
- ‘news:newsgroup’
Retrieves a list of messages in newsgroup;
- ‘news:message-id’
Retrieves the message with the given message-id;
- ‘news:*’
Retrieves a list of all available newsgroups;
- ‘nntp://host:port/newsgroup’
- ‘nntp://host:port/message-id’
- ‘nntp://host:port/*’
Similar to the ‘news’ versions.
The default port for nntp
(and news
) is 119. The
difference between an nntp
URL and a news
URL is that an
nttp
URL may specify an article by its number. The
‘snews’ scheme is the same as ‘nntp’, except that it is
tunneled through SSL and has default port 563.
These URLs are retrieved via the Gnus package.
- User Option: url-news-server ¶
This variable specifies the default news server from which to fetch news, if no server was specified in the URL. The default value,
nil
, means to use the server specified by the standard environment variable ‘NNTPSERVER’, or ‘news’ if that environment variable is unset.
Next: irc, Previous: news
, nntp
and snews
, Up: Supported URL Types [Contents][Index]
4.6 telnet and tn3270
These URL schemes are defined in RFC 1738, and are used for logging in via a terminal emulator. They have the form
telnet://user:password@host:port
but the password component is ignored. By default, the
telnet
scheme is handled via Tramp (see URL Types Supported via Tramp).
To handle telnet and tn3270 URLs, a telnet
or tn3270
(the program names and arguments are hardcoded) session is run in a
terminal-emulator
buffer. Well-known ports are used if the URL
does not specify a port.
Next: data, Previous: telnet and tn3270, Up: Supported URL Types [Contents][Index]
4.7 irc
The irc
scheme is defined in the Internet Draft at
https://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt (which
was never approved as an RFC). Such URLs have the form
irc://host:port/target,needpass
and are retrieved by opening an IRC session using the
function specified by url-irc-function
.
- User Option: url-irc-function ¶
The value of this option is a function, which is called to open an IRC connection for
irc
URLs. This function must take five arguments, host, port, channel, user and password. The channel argument specifies the channel to join immediately, and may benil
.The default is
url-irc-rcirc
, which uses the Rcirc package. Other options areurl-irc-erc
(which uses ERC) andurl-irc-zenirc
(which uses ZenIRC).
Next: nfs, Previous: irc, Up: Supported URL Types [Contents][Index]
4.8 data
The data
scheme, defined in RFC 2397, contains MIME data in
the URL itself. Such URLs have the form
data:[media-type][;base64],data
media-type is a MIME ‘Content-Type’ string, possibly including parameters. It defaults to ‘text/plain;charset=US-ASCII’. The ‘text/plain’ can be omitted but the charset parameter supplied. If ‘;base64’ is present, the data are base64-encoded.
Next: ldap, Previous: data, Up: Supported URL Types [Contents][Index]
4.9 nfs
The nfs
scheme, defined in RFC 2224, is similar to ftp
except that it points to a file on a remote host that is handled by an
NFS automounter on the local host. Such URLs have the form
nfs://user:password@host:port/file
- Variable: url-nfs-automounter-directory-spec ¶
A string saying how to invoke the NFS automounter. Certain ‘%’ sequences are recognized:
- ‘%h’
The hostname of the NFS server;
- ‘%n’
The port number of the NFS server;
- ‘%u’
The username to use to authenticate;
- ‘%p’
The password to use to authenticate;
- ‘%f’
The filename on the remote server;
- ‘%%’
A literal ‘%’.
Each can be used any number of times.
Next: man, Previous: nfs, Up: Supported URL Types [Contents][Index]
4.10 ldap
The LDAP scheme is defined in RFC 2255.
Next: URL Types Supported via Tramp, Previous: ldap, Up: Supported URL Types [Contents][Index]
4.11 man
The man
scheme is a non-standard one. Such URLs have the form
‘man:page-spec’
and are retrieved by passing page-spec to the Lisp function
man
.
Previous: man, Up: Supported URL Types [Contents][Index]
4.12 URL Types Supported via Tramp
Some additional URL types are supported by passing them to Tramp
(see The Tramp Manual in The Tramp Manual). These
protocols are listed in the url-tramp-protocols
variable, which
you can customize. The default value includes the following
protocols:
ftp
The file transfer protocol. See file and ftp.
ssh
¶The secure shell protocol. See Inline methods in The Tramp Manual.
scp
¶The secure file copy protocol. See External methods in The Tramp Manual.
rsync
¶The remote sync protocol.
telnet
The telnet protocol.
Next: Customization, Previous: Supported URL Types, Up: URL [Contents][Index]
5 General Facilities
Next: Proxies and Gatewaying, Up: General Facilities [Contents][Index]
5.1 Disk Caching
The disk cache stores retrieved documents locally, whence they can be retrieved more quickly. When requesting a URL that is in the cache, the library checks to see if the page has changed since it was last retrieved from the remote machine. If not, the local copy is used, saving the transmission over the network. Currently the cache isn’t cleared automatically.
- User Option: url-automatic-caching ¶
Setting this variable non-
nil
causes documents to be cached automatically.
- User Option: url-cache-directory ¶
This variable specifies the directory to store the cache files. It defaults to sub-directory cache of
url-configuration-directory
.
- User Option: url-cache-creation-function ¶
The cache relies on a scheme for mapping URLs to files in the cache. This variable names a function which sets the type of cache to use. It takes a URL as argument and returns the absolute file name of the corresponding cache file. The two supplied possibilities are
url-cache-create-filename-using-md5
andurl-cache-create-filename-human-readable
.
- Function: url-cache-create-filename-using-md5 url ¶
Creates a cache file name from url using MD5 hashing. This is creates entries with very few cache collisions and is fast.
(url-cache-create-filename-using-md5 "http://www.example.com/foo/bar") ⇒ "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
- Function: url-cache-create-filename-human-readable url ¶
Creates a cache file name from url more obviously connected to url than for
url-cache-create-filename-using-md5
, but more likely to conflict with other files.(url-cache-create-filename-human-readable "http://www.example.com/foo/bar") ⇒ "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
- Function: url-cache-expired ¶
This function returns non-
nil
if a cache entry has expired (or is absent). The arguments are a URL and optional expiration delay in seconds (default url-cache-expire-time).
- User Option: url-cache-expire-time ¶
This variable is the default number of seconds to use for the expire-time argument of the function
url-cache-expired
.
- Function: url-fetch-from-cache ¶
This function takes a URL as its argument and returns a buffer containing the data cached for that URL.
Next: Gateways in General, Previous: Disk Caching, Up: General Facilities [Contents][Index]
5.2 Proxies and Gatewaying
Proxy servers are commonly used to provide gateways through firewalls
or as caches serving some more-or-less local network. Each protocol
(HTTP, FTP, etc.) can have a different gateway server. Proxying is
conventionally configured commonly amongst different programs through
environment variables of the form protocol_proxy
, where
protocol is one of the supported network protocols (http
,
ftp
etc.). The library recognizes such variables in either
upper or lower case. Their values are of one of the forms:
Jump to: | I M T U |
---|
Jump to: | I M T U |
---|
Next: Concept Index, Previous: Command and Function Index, Up: URL [Contents][Index]
Variable Index
Jump to: | H M N S U |
---|
Jump to: | H M N S U |
---|
Previous: Variable Index, Up: URL [Contents][Index]
Concept Index
Jump to: | A B C D E F G H I L M N O P R S T U Z |
---|
Jump to: | A B C D E F G H I L M N O P R S T U Z |
---|