7.3.2 Universal Resource Identifiers

Guile provides a standard data type for Universal Resource Identifiers (URIs), as defined in RFC 3986.

The generic URI syntax is as follows:

URI-reference := [scheme ":"] ["//" [userinfo "@"] host [":" port]] path \
                 [ "?" query ] [ "#" fragment ]

For example, in the URI, ‘http://www.gnu.org/help/’, the scheme is http, the host is www.gnu.org, the path is /help/, and there is no userinfo, port, query, or fragment.

Userinfo is something of an abstraction, as some legacy URI schemes allowed userinfo of the form username:passwd. But since passwords do not belong in URIs, the RFC does not want to condone this practice, so it calls anything before the @ sign userinfo.

(use-modules (web uri))

The following procedures can be found in the (web uri) module. Load it into your Guile, using a form like the above, to have access to them.

The most common way to build a URI from Scheme is with the build-uri function.

Scheme Procedure: build-uri scheme [#:userinfo=#f] [#:host=#f] [#:port=#f] [#:path=""] [#:query=#f] [#:fragment=#f] [#:validate?=#t]

Construct a URI. scheme should be a symbol, port either a positive, exact integer or #f, and the rest of the fields are either strings or #f. If validate? is true, also run some consistency checks to make sure that the constructed URI is valid.

Scheme Procedure: uri? obj

Return #t if obj is a URI.

Guile, URIs are represented as URI records, with a number of associated accessors.

Scheme Procedure: uri-scheme uri
Scheme Procedure: uri-userinfo uri
Scheme Procedure: uri-host uri
Scheme Procedure: uri-port uri
Scheme Procedure: uri-path uri
Scheme Procedure: uri-query uri
Scheme Procedure: uri-fragment uri

Field accessors for the URI record type. The URI scheme will be a symbol, or #f if the object is a relative-ref (see below). The port will be either a positive, exact integer or #f, and the rest of the fields will be either strings or #f if not present.

Scheme Procedure: string->uri string

Parse string into a URI object. Return #f if the string could not be parsed.

Scheme Procedure: uri->string uri [#:include-fragment?=#t]

Serialize uri to a string. If the URI has a port that is the default port for its scheme, the port is not included in the serialization. If include-fragment? is given as false, the resulting string will omit the fragment (if any).

Scheme Procedure: declare-default-port! scheme port

Declare a default port for the given URI scheme.

Scheme Procedure: uri-decode str [#:encoding="utf-8"] [#:decode-plus-to-space? #t]

Percent-decode the given str, according to encoding, which should be the name of a character encoding.

Note that this function should not generally be applied to a full URI string. For paths, use split-and-decode-uri-path instead. For query strings, split the query on & and = boundaries, and decode the components separately.

Note also that percent-encoded strings encode bytes, not characters. There is no guarantee that a given byte sequence is a valid string encoding. Therefore this routine may signal an error if the decoded bytes are not valid for the given encoding. Pass #f for encoding if you want decoded bytes as a bytevector directly. See set-port-encoding!, for more information on character encodings.

If decode-plus-to-space? is true, which is the default, also replace instances of the plus character ‘+’ with a space character. This is needed when parsing application/x-www-form-urlencoded data.

Returns a string of the decoded characters, or a bytevector if encoding was #f.

Scheme Procedure: uri-encode str [#:encoding="utf-8"] [#:unescaped-chars]

Percent-encode any character not in the character set, unescaped-chars.

The default character set includes alphanumerics from ASCII, as well as the special characters ‘-’, ‘.’, ‘_’, and ‘~’. Any other character will be percent-encoded, by writing out the character to a bytevector within the given encoding, then encoding each byte as %HH, where HH is the hexadecimal representation of the byte.

Scheme Procedure: split-and-decode-uri-path path

Split path into its components, and decode each component, removing empty components.

For example, "/foo/bar%20baz/" decodes to the two-element list, ("foo" "bar baz").

Scheme Procedure: encode-and-join-uri-path parts

URI-encode each element of parts, which should be a list of strings, and join the parts together with / as a delimiter.

For example, the list ("scrambled eggs" "biscuits&gravy") encodes as "scrambled%20eggs/biscuits%26gravy".

Subtypes of URI

As we noted above, not all URI objects have a scheme. You might have noted in the “generic URI syntax” example that the left-hand side of that grammar definition was URI-reference, not URI. A URI-reference is a generalization of a URI where the scheme is optional. If no scheme is specified, it is taken to be relative to some other related URI. A common use of URI references is when you want to be vague regarding the choice of HTTP or HTTPS – serving a web page referring to /foo.css will use HTTPS if loaded over HTTPS, or HTTP otherwise.

Scheme Procedure: build-uri-reference [#:scheme=#f] [#:userinfo=#f] [#:host=#f] [#:port=#f] [#:path=""] [#:query=#f] [#:fragment=#f] [#:validate?=#t]

Like build-uri, but with an optional scheme.

Scheme Procedure: uri-reference? obj

Return #t if obj is a URI-reference. This is the most general URI predicate, as it includes not only full URIs that have schemes (those that match uri?) but also URIs without schemes.

It’s also possible to build a relative-ref: a URI-reference that explicitly lacks a scheme.

Scheme Procedure: build-relative-ref [#:userinfo=#f] [#:host=#f] [#:port=#f] [#:path=""] [#:query=#f] [#:fragment=#f] [#:validate?=#t]

Like build-uri, but with no scheme.

Scheme Procedure: relative-ref? obj

Return #t if obj is a “relative-ref”: a URI-reference that has no scheme. Every URI-reference will either match uri? or relative-ref? (but not both).

In case it’s not clear from the above, the most general of these URI types is the URI-reference, with build-uri-reference as the most general constructor. build-uri and build-relative-ref enforce enforce specific restrictions on the URI-reference. The most generic URI parser is then string->uri-reference, and there is also a parser for when you know that you want a relative-ref.

Note that uri? will only return #t for URI objects that have schemes; that is, it rejects relative-refs.

Scheme Procedure: string->uri-reference string

Parse string into a URI object, while not requiring a scheme. Return #f if the string could not be parsed.

Scheme Procedure: string->relative-ref string

Parse string into a URI object, while asserting that no scheme is present. Return #f if the string could not be parsed.