% This LaTeX document was generated using the LaTeX backend of PlDoc, % The SWI-Prolog documentation system \section{library(url): Analysing and constructing URL} \label{sec:url} \begin{tags} \mtag{author}- Jan Wielemaker \\- Lukas Faulstich \tag{deprecated} New code should use \file{library(uri)}, provided by the \const{clib} package. \end{tags} This library deals with the analysis and construction of a URL, Universal Resource Locator. URL is the basis for communicating locations of resources (data) on the web. A URL consists of a protocol identifier (e.g. HTTP, FTP, and a protocol-specific syntax further defining the location. URLs are standardized in RFC-1738. The implementation in this library covers only a small portion of the defined protocols. Though the initial implementation followed RFC-1738 strictly, the current is more relaxed to deal with frequent violations of the standard encountered in practical use.\vspace{0.7cm} \begin{description} \predicate[det]{global_url}{3}{+URL, +Base, -Global} Translate a possibly relative \arg{URL} into an absolute one. \begin{tags} \tag{Errors} \verb$syntax_error(illegal_url)$ if \arg{URL} is not legal. \end{tags} \predicate{is_absolute_url}{1}{+URL} True if \arg{URL} is an absolute \arg{URL}. That is, a \arg{URL} that starts with a protocol identifier. \predicate{http_location}{2}{?Parts, ?Location} Construct or analyze an HTTP location. This is similar to \predref{parse_url}{2}, but only deals with the location part of an HTTP URL. That is, the path, search and fragment specifiers. In the HTTP protocol, the first line of a message is \begin{code} HTTP/ \end{code} \begin{arguments} \arg{Location} & Atom or list of character codes. \\ \end{arguments} \predicate[det]{parse_url}{2}{?URL, ?Attributes} Construct or analyse a \arg{URL}. \arg{URL} is an atom holding a \arg{URL} or a variable. \arg{Attributes} is a list of components. Each component is of the format Name(Value). Defined components are: \begin{description} \termitem{protocol}{Protocol} The used protocol. This is, after the optional \verb$url:$, an identifier separated from the remainder of the \arg{URL} using :. \predref{parse_url}{2} assumes the \const{http} protocol if no protocol is specified and the \arg{URL} can be parsed as a valid HTTP url. In addition to the RFC-1738 specified protocols, the \const{file} protocol is supported as well. \termitem{host}{Host} \arg{Host}-name or IP-address on which the resource is located. Supported by all network-based protocols. \termitem{port}{Port} Integer port-number to access on the \Sneg{}arg\{Host\}. This only appears if the port is explicitly specified in the \arg{URL}. Implicit default ports (e.g., 80 for HTTP) do \textit{not} appear in the part-list. \termitem{path}{Path} (File-) path addressed by the \arg{URL}. This is supported for the \const{ftp}, \const{http} and \const{file} protocols. If no path appears, the library generates the path \verb$/$. \termitem{search}{ListOfNameValue} Search-specification of HTTP \arg{URL}. This is the part after the \verb$?$, normally used to transfer data from HTML forms that use the HTTP GET method. In the \arg{URL} it consists of a www-form-encoded list of Name=Value pairs. This is mapped to a list of Prolog Name=Value terms with decoded names and values. \termitem{fragment}{Fragment} \arg{Fragment} specification of HTTP \arg{URL}. This is the part after the \verb$#$ character. \end{description} The example below illustrates all of this for an HTTP \arg{URL}. \begin{code} ?- parse_url('http://www.xyz.org/hello?msg=Hello+World%21#x', P). P = [ protocol(http), host('www.xyz.org'), fragment(x), search([ msg = 'Hello World!' ]), path('/hello') ] \end{code} By instantiating the parts-list this predicate can be used to create a \arg{URL}. \predicate[det]{parse_url}{3}{+URL, +BaseURL, -Attributes} Similar to \predref{parse_url}{2} for relative URLs. If \arg{URL} is relative, it is resolved using the absolute \arg{URL} \arg{BaseURL}. \predicate[det]{www_form_encode}{2}{+Value, -XWWWFormEncoded} \nodescription \predicate[det]{www_form_encode}{2}{-Value, +XWWWFormEncoded} En/decode to/from application/x-www-form-encoded. Encoding encodes all characters except RFC 3986 \textit{unreserved} (ASCII \const{alnum} (see \predref{code_type}{2})), and one of "-._\Stilde{}" using percent encoding. Newline is mapped to \verb$%OD%OA$. When decoding, newlines appear as a single newline (10) character. Note that a space is encoded as \verb$%20$ instead of \verb$+$. Decoding decodes both to a space. \begin{tags} \tag{deprecated} Use \predref{uri_encoded}{3} for new code. \end{tags} \predicate[semidet]{set_url_encoding}{2}{?Old, +New} Query and set the encoding for URLs. The default is \const{utf8}. The only other defined value is \verb$iso_latin_1$. \begin{tags} \tag{To be done} Having a global flag is highly inconvenient, but a work-around for old sites using ISO Latin 1 encoding. \end{tags} \predicate[det]{url_iri}{2}{+Encoded, -Decoded} \nodescription \predicate[det]{url_iri}{2}{-Encoded, +Decoded} Convert between a URL, encoding in US-ASCII and an IRI. An IRI is a fully expanded Unicode string. Unicode strings are first encoded into UTF-8, after which \%-encoding takes place. \predicate[det]{parse_url_search}{2}{?Spec, ?Fields:list(Name=Value)} Construct or analyze an HTTP search specification. This deals with form data using the MIME-type \verb$application/x-www-form-urlencoded$ as used in HTTP GET requests. \predicate[det]{file_name_to_url}{2}{+File, -URL} \nodescription \predicate[semidet]{file_name_to_url}{2}{-File, +URL} Translate between a filename and a file:\Sidiv{} \arg{URL}. \begin{tags} \tag{To be done} Current implementation does not deal with paths that need special encoding. \end{tags} \end{description}