Guile provides a standard data type for Universal Resource Identifiers (URIs), as defined in RFC 3986.
The generic URI syntax is as follows:
URI-reference := [scheme ":"] ["//" [userinfo "@"] host [":" port]] path \ [ "?" query ] [ "#" fragment ]
For example, in the URI, ‘
http, the host is
www.gnu.org, the path is
/help/, and there is no userinfo, port, query, or fragment.
Userinfo is something of an abstraction, as some legacy URI schemes
allowed userinfo of the form
since passwords do not belong in URIs, the RFC does not want to condone
this practice, so it calls anything before the
(use-modules (web uri))
The following procedures can be found in the
module. Load it into your Guile, using a form like the above, to have
access to them.
The most common way to build a URI from Scheme is with the
Construct a URI. scheme should be a symbol, port either a
positive, exact integer or
#f, and the rest of the fields are
either strings or
#f. If validate? is true, also run some
consistency checks to make sure that the constructed URI is valid.
#t if obj is a URI.
Guile, URIs are represented as URI records, with a number of associated accessors.
Field accessors for the URI record type. The URI scheme will be a
#f if the object is a relative-ref (see below). The
port will be either a positive, exact integer or
#f, and the rest
of the fields will be either strings or
#f if not present.
Parse string into a URI object. Return
#f if the string
could not be parsed.
Serialize uri to a string. If the URI has a port that is the default port for its scheme, the port is not included in the serialization. If include-fragment? is given as false, the resulting string will omit the fragment (if any).
Declare a default port for the given URI scheme.
"utf-8"] [#:decode-plus-to-space? #t]
Percent-decode the given str, according to encoding, which should be the name of a character encoding.
Note that this function should not generally be applied to a full URI
string. For paths, use
split-and-decode-uri-path instead. For
query strings, split the query on
= boundaries, and
decode the components separately.
Note also that percent-encoded strings encode bytes, not
characters. There is no guarantee that a given byte sequence is a valid
string encoding. Therefore this routine may signal an error if the
decoded bytes are not valid for the given encoding. Pass
encoding if you want decoded bytes as a bytevector directly.
set-port-encoding!, for more information on
If decode-plus-to-space? is true, which is the default, also
replace instances of the plus character ‘+’ with a space character.
This is needed when parsing
Returns a string of the decoded characters, or a bytevector if
Percent-encode any character not in the character set, unescaped-chars.
The default character set includes alphanumerics from ASCII, as well as
the special characters ‘-’, ‘.’, ‘_’, and ‘~’. Any
other character will be percent-encoded, by writing out the character to
a bytevector within the given encoding, then encoding each byte as
%HH, where HH is the hexadecimal representation of
Split path into its components, and decode each component, removing empty components.
"/foo/bar%20baz/" decodes to the two-element list,
("foo" "bar baz").
URI-encode each element of parts, which should be a list of
strings, and join the parts together with
/ as a delimiter.
For example, the list
("scrambled eggs" "biscuits&gravy") encodes
As we noted above, not all URI objects have a scheme. You might have
noted in the “generic URI syntax” example that the left-hand side of
that grammar definition was URI-reference, not URI. A
URI-reference is a generalization of a URI where the scheme is
optional. If no scheme is specified, it is taken to be relative to some
other related URI. A common use of URI references is when you want to
be vague regarding the choice of HTTP or HTTPS – serving a web page
/foo.css will use HTTPS if loaded over HTTPS, or
build-uri, but with an optional scheme.
#t if obj is a URI-reference. This is the most
general URI predicate, as it includes not only full URIs that have
schemes (those that match
uri?) but also URIs without schemes.
It’s also possible to build a relative-ref: a URI-reference that explicitly lacks a scheme.
build-uri, but with no scheme.
#t if obj is a “relative-ref”: a URI-reference
that has no scheme. Every URI-reference will either match
relative-ref? (but not both).
In case it’s not clear from the above, the most general of these URI
types is the URI-reference, with
build-uri-reference as the most
enforce enforce specific restrictions on the URI-reference. The most
generic URI parser is then
string->uri-reference, and there is
also a parser for when you know that you want a relative-ref.
Parse string into a URI object, while not requiring a scheme.
#f if the string could not be parsed.
Parse string into a URI object, while asserting that no scheme is
#f if the string could not be parsed.
For compatibility reasons, note that
uri? will return
for all URI objects, even relative-refs. In contrast,
string->uri require that the resulting URI not be a
relative-ref. As a predicate to distinguish relative-refs from proper
URIs (in the language of RFC 3986), use something like
(uri-reference? x) (not (relative-ref? x))).