% This LaTeX document was generated using the LaTeX backend of PlDoc,
% The SWI-Prolog documentation system


\section{library(url): Analysing and constructing URL}

\label{sec:url}

\begin{tags}
\mtag{author}- Jan Wielemaker \\- Lukas Faulstich
    \tag{deprecated}
New code should use \file{library(uri)}, provided by the \const{clib}
package.
\end{tags}

This library deals with the analysis and construction of a URL,
Universal Resource Locator. URL is the basis for communicating locations
of resources (data) on the web. A URL consists of a protocol identifier
(e.g. HTTP, FTP, and a protocol-specific syntax further defining the
location. URLs are standardized in RFC-1738.

The implementation in this library covers only a small portion of the
defined protocols. Though the initial implementation followed RFC-1738
strictly, the current is more relaxed to deal with frequent violations
of the standard encountered in practical use.\vspace{0.7cm}

\begin{description}
    \predicate[det]{global_url}{3}{+URL, +Base, -Global}
Translate a possibly relative \arg{URL} into an absolute one.

\begin{tags}
    \tag{Errors}
\verb$syntax_error(illegal_url)$ if \arg{URL} is not legal.
\end{tags}

    \predicate{is_absolute_url}{1}{+URL}
True if \arg{URL} is an absolute \arg{URL}. That is, a \arg{URL} that starts with
a protocol identifier.

    \predicate{http_location}{2}{?Parts, ?Location}
Construct or analyze an HTTP location. This is similar to
\predref{parse_url}{2}, but only deals with the location part of an HTTP
URL. That is, the path, search and fragment specifiers. In the
HTTP protocol, the first line of a message is

\begin{code}
<Action> <Location> HTTP/<version>
\end{code}

\begin{arguments}
\arg{Location} & Atom or list of character codes. \\
\end{arguments}

    \predicate[det]{parse_url}{2}{?URL, ?Attributes}
Construct or analyse a \arg{URL}. \arg{URL} is an atom holding a \arg{URL} or a
variable. \arg{Attributes} is a list of components. Each component is
of the format Name(Value). Defined components are:

\begin{description}
    \termitem{protocol}{Protocol}
The used protocol. This is, after the optional \verb$url:$, an
identifier separated from the remainder of the \arg{URL} using :.
\predref{parse_url}{2} assumes the \const{http} protocol if no protocol is
specified and the \arg{URL} can be parsed as a valid HTTP url. In
addition to the RFC-1738 specified protocols, the \const{file}
protocol is supported as well.
    \termitem{host}{Host}
\arg{Host}-name or IP-address on which the resource is located.
Supported by all network-based protocols.
    \termitem{port}{Port}
Integer port-number to access on the \Sneg{}arg\{Host\}. This only
appears if the port is explicitly specified in the \arg{URL}.
Implicit default ports (e.g., 80 for HTTP) do \textit{not} appear
in the part-list.
    \termitem{path}{Path}
(File-) path addressed by the \arg{URL}. This is supported for the
\const{ftp}, \const{http} and \const{file} protocols. If no path appears, the
library generates the path \verb$/$.
    \termitem{search}{ListOfNameValue}
Search-specification of HTTP \arg{URL}. This is the part after the
\verb$?$, normally used to transfer data from HTML forms that
use the HTTP GET method. In the \arg{URL} it consists of a
www-form-encoded list of Name=Value pairs. This is mapped to
a list of Prolog Name=Value terms with decoded names and
values.
    \termitem{fragment}{Fragment}
\arg{Fragment} specification of HTTP \arg{URL}. This is the part after
the \verb$#$ character.
\end{description}

The example below illustrates all of this for an HTTP \arg{URL}.

\begin{code}
?- parse_url('http://www.xyz.org/hello?msg=Hello+World%21#x',
       P).

P = [ protocol(http),
      host('www.xyz.org'),
      fragment(x),
      search([ msg = 'Hello World!'
             ]),
      path('/hello')
    ]
\end{code}

By instantiating the parts-list this predicate can be used to
create a \arg{URL}.

    \predicate[det]{parse_url}{3}{+URL, +BaseURL, -Attributes}
Similar to \predref{parse_url}{2} for relative URLs. If \arg{URL} is relative,
it is resolved using the absolute \arg{URL} \arg{BaseURL}.

    \predicate[det]{www_form_encode}{2}{+Value, -XWWWFormEncoded}
\nodescription
    \predicate[det]{www_form_encode}{2}{-Value, +XWWWFormEncoded}
En/decode to/from application/x-www-form-encoded. Encoding
encodes all characters except RFC 3986 \textit{unreserved} (ASCII
\const{alnum} (see \predref{code_type}{2})), and one of "-._\Stilde{}" using percent
encoding. Newline is mapped to \verb$%OD%OA$. When decoding,
newlines appear as a single newline (10) character.

Note that a space is encoded as \verb$%20$ instead of \verb$+$.
Decoding decodes both to a space.

\begin{tags}
    \tag{deprecated}
Use \predref{uri_encoded}{3} for new code.
\end{tags}

    \predicate[semidet]{set_url_encoding}{2}{?Old, +New}
Query and set the encoding for URLs. The default is \const{utf8}.
The only other defined value is \verb$iso_latin_1$.

\begin{tags}
    \tag{To be done}
Having a global flag is highly inconvenient, but a
work-around for old sites using ISO Latin 1 encoding.
\end{tags}

    \predicate[det]{url_iri}{2}{+Encoded, -Decoded}
\nodescription
    \predicate[det]{url_iri}{2}{-Encoded, +Decoded}
Convert between a URL, encoding in US-ASCII and an IRI. An IRI
is a fully expanded Unicode string. Unicode strings are first
encoded into UTF-8, after which \%-encoding takes place.

    \predicate[det]{parse_url_search}{2}{?Spec, ?Fields:list(Name=Value)}
Construct or analyze an HTTP search specification. This deals
with form data using the MIME-type
\verb$application/x-www-form-urlencoded$ as used in HTTP GET
requests.

    \predicate[det]{file_name_to_url}{2}{+File, -URL}
\nodescription
    \predicate[semidet]{file_name_to_url}{2}{-File, +URL}
Translate between a filename and a file:\Sidiv{} \arg{URL}.

\begin{tags}
    \tag{To be done}
Current implementation does not deal with paths that
need special encoding.
\end{tags}
\end{description}