% This LaTeX document was generated using the LaTeX backend of PlDoc, % The SWI-Prolog documentation system The central module of the RDF infrastructure is \file{library(semweb/rdf_db)}. It provides storage and indexed querying of RDF triples. RDF data is stored as quintuples. The first three elements denote the RDF triple. The extra \textit{Graph} and \textit{Line} elements provide information about the origin of the triple. The actual storage is provided by the \textit{foreign language (C)} module. Using a dedicated C-based implementation we can reduce memory usage and improve indexing capabilities, for example by providing a dedicated index to support entailment over \verb$rdfs:subPropertyOf$. Currently the following indexes are provided (S=subject, P=predicate, O=object, G=graph): \begin{itemize} \item S, P, O, SP, PO, SPO, G, SG, PG \item Predicates connected by \textbf{rdfs:subPropertyOf} are combined in a \textit{predicate cloud}. The system causes multiple predicates in the cloud to share the same hash. The cloud maintains a 2-dimensional array that expresses the closure of all \verb$rdfs:subPropertyOf$ relations. This index supports \predref{rdf_has}{3} to query a property and all its children efficiently. \item Additional indexes for predicates, resources and graphs allow enumerating these objects without duplicates. For example, using \predref{rdf_resource}{1} we enumerate all resources in the database only once, while enumeration using e.g., \verb$(rdf(R,_,_);rdf(_,_,R))$ normally produces many duplicate answers. \item Literal \textit{Objects} are combined in a \textit{skip list} after case normalization. This provides for efficient case-insensitive search, prefix and range search. The plugin library \verb$library(semweb/litindex)$ provides indexed search on tokens inside literals. \end{itemize} \subsubsection{Query the RDF database} \label{sec:semweb-query} \begin{description} \predicate[nondet]{rdf}{3}{?Subject, ?Predicate, ?Object} Elementary query for triples. \arg{Subject} and \arg{Predicate} are atoms representing the fully qualified URL of the resource. \arg{Object} is either an atom representing a resource or \verb$literal(Value)$ if the object is a literal value. If a value of the form NameSpaceID:LocalName is provided it is expanded to a ground atom using \predref{expand_goal}{2}. This implies you can use this construct in compiled code without paying a performance penalty. Literal values take one of the following forms: \begin{description} \termitem{\arg{Atom}}{} If the value is a simple atom it is the textual representation of a string literal without explicit type or language qualifier. \termitem{lang}{LangID, Atom} \arg{Atom} represents the text of a string literal qualified with the given language. \termitem{type}{TypeID, Value} Used for attributes qualified using the \verb$rdf:datatype$ \arg{TypeID}. The \arg{Value} is either the textual representation or a natural Prolog representation. See the option convert_typed_literal(:Convertor) of the parser. The storage layer provides efficient handling of atoms, integers (64-bit) and floats (native C-doubles). All other data is represented as a Prolog record. \end{description} For literal querying purposes, \arg{Object} can be of the form \verb$literal(+Query, -Value)$, where Query is one of the terms below. If the Query takes a literal argument and the value has a numeric type numerical comparison is performed. \begin{description} \termitem{plain}{+Text} Perform exact match and demand the language or type qualifiers to match. This query is fully indexed. \termitem{icase}{+Text} Perform a full but case-insensitive match. This query is fully indexed. \termitem{exact}{+Text} Same as \verb$icase(Text)$. Backward compatibility. \termitem{substring}{+Text} Match any literal that contains \arg{Text} as a case-insensitive substring. The query is not indexed on \arg{Object}. \termitem{word}{+Text} Match any literal that contains \arg{Text} delimited by a non alpha-numeric character, the start or end of the string. The query is not indexed on \arg{Object}. \termitem{prefix}{+Text} Match any literal that starts with \arg{Text}. This call is intended for completion. The query is indexed using the skip list of literals. \termitem{ge}{+Literal} Match any literal that is equal or larger than \arg{Literal} in the ordered set of literals. \termitem{gt}{+Literal} Match any literal that is larger than \arg{Literal} in the ordered set of literals. \termitem{eq}{+Literal} Match any literal that is equal to \arg{Literal} in the ordered set of literals. \termitem{le}{+Literal} Match any literal that is equal or smaller than \arg{Literal} in the ordered set of literals. \termitem{lt}{+Literal} Match any literal that is smaller than \arg{Literal} in the ordered set of literals. \termitem{between}{+Literal1, +Literal2} Match any literal that is between \arg{Literal1} and \arg{Literal2} in the ordered set of literals. This may include both \arg{Literal1} and \arg{Literal2}. \termitem{like}{+Pattern} Match any literal that matches \arg{Pattern} case insensitively, where the `*' character in \arg{Pattern} matches zero or more characters. \end{description} Backtracking never returns duplicate triples. Duplicates can be retrieved using \predref{rdf}{4}. The predicate \predref{rdf}{3} raises a type-error if called with improper arguments. If \predref{rdf}{3} is called with a term \verb$literal(_)$ as \arg{Subject} or \arg{Predicate} object it fails silently. This allows for graph matching goals like \verb$rdf(S,P,O)$,\verb$rdf(O,P2,O2)$ to proceed without errors. \predicate[nondet]{rdf}{4}{?Subject, ?Predicate, ?Object, ?Source} As \predref{rdf}{3} but in addition query the graph to which the triple belongs. Unlike \predref{rdf}{3}, this predicate does not remove duplicates from the result set. \begin{arguments} \arg{Source} & is a term Graph:Line. If \arg{Source} is instatiated, passing an atom is the same as passing Atom:_. \\ \end{arguments} \predicate[nondet]{rdf_has}{3}{?Subject, +Predicate, ?Object} Succeeds if the triple \verb$rdf(Subject, Predicate, Object)$ is true exploiting the rdfs:subPropertyOf predicate as well as inverse predicates declared using \predref{rdf_set_predicate}{2} with the \verb$inverse_of$ property. \predicate[nondet]{rdf_has}{4}{?Subject, +Predicate, ?Object, -RealPredicate} Same as \predref{rdf_has}{3}, but \arg{RealPredicate} is unified to the actual predicate that makes this relation true. \arg{RealPredicate} must be \arg{Predicate} or an rdfs:subPropertyOf \arg{Predicate}. If an inverse match is found, \arg{RealPredicate} is the term \verb$inverse_of(Pred)$. \predicate[nondet]{rdf_reachable}{3}{?Subject, +Predicate, ?Object} Is true if \arg{Object} can be reached from \arg{Subject} following the transitive predicate \arg{Predicate} or a sub-property thereof, while repecting the \verb$symetric(true)$ or \verb$inverse_of(P2)$ properties. If used with either \arg{Subject} or \arg{Object} unbound, it first returns the origin, followed by the reachable nodes in breadth-first search-order. The implementation internally looks one solution ahead and succeeds deterministically on the last solution. This predicate never generates the same node twice and is robust against cycles in the transitive relation. With all arguments instantiated, it succeeds deterministically if a path can be found from \arg{Subject} to \arg{Object}. Searching starts at \arg{Subject}, assuming the branching factor is normally lower. A call with both \arg{Subject} and \arg{Object} unbound raises an instantiation error. The following example generates all subclasses of rdfs:Resource: \begin{code} ?- rdf_reachable(X, rdfs:subClassOf, rdfs:'Resource'). X = 'http://www.w3.org/2000/01/rdf-schema#Resource' ; X = 'http://www.w3.org/2000/01/rdf-schema#Class' ; X = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property' ; ... \end{code} \predicate[nondet]{rdf_reachable}{5}{?Subject, +Predicate, ?Object, +MaxD, -D} Same as \predref{rdf_reachable}{3}, but in addition, \arg{MaxD} limits the number of edges expanded and \arg{D} is unified with the `distance' between \arg{Subject} and \arg{Object}. Distance 0 means \arg{Subject} and \arg{Object} are the same resource. \arg{MaxD} can be the constant \const{infinite} to impose no distance-limit. \end{description} \subsubsection{Enumerating objects} \label{sec:semweb-enumerate} The predicates below enumerate the basic objects of the RDF store. Most of these predicates also enumerate objects that are not associated to any currently visible triple. Objects are retained as long as they are visible in active queries or \textit{snapshots}. After that, some are reclaimed by the RDF garbage collector, while others are never reclaimed. \begin{description} \predicate[nondet]{rdf_subject}{1}{?Resource} True if \arg{Resource} appears as a subject. This query respects the visibility rules implied by the logical update view. \begin{tags} \tag{See also} \predref{rdf_resource}{1}. \end{tags} \predicate[nondet]{rdf_resource}{1}{?Resource} True when \arg{Resource} is a resource used as a subject or object in a triple. This predicate is primarily intended as a way to process all resources without processing resources twice. The user must be aware that some of the returned resources may not appear in any \textit{visible} triple. \predicate[nondet]{rdf_current_predicate}{1}{?Predicate} True when \arg{Predicate} is a currently known predicate. Predicates are created if a triples is created that uses this predicate or a property of the predicate is set using \predref{rdf_set_predicate}{2}. The predicate may (no longer) have triples associated with it. Note that resources that have \verb$rdf:type$ \verb$rdf:Property$ are not automatically included in the result-set of this predicate, while \textit{all} resources that appear as the second argument of a triple \textit{are} included. \begin{tags} \tag{See also} \predref{rdf_predicate_property}{2}. \end{tags} \predicate[nondet]{rdf_current_literal}{1}{-Literal} True when \arg{Literal} is a currently known literal. Enumerates each unique literal exactly once. Note that it is possible that the literal only appears in already deleted triples. Deleted triples may be locked due to active queries, transactions or snapshots or may not yet be reclaimed by the garbage collector. \predicate[nondet]{rdf_graph}{1}{?Graph} True when \arg{Graph} is an existing graph. \predicate[nondet]{rdf_current_ns}{2}{:Prefix, ?URI} \begin{tags} \tag{deprecated} Use \predref{rdf_current_prefix}{2}. \end{tags} \end{description} \subsubsection{Modifying the RDF database} \label{sec:semweb-modify} The predicates below modify the RDF store directly. In addition, data may be loaded using \predref{rdf_load}{2} or by restoring a persistent database using \predref{rdf_attach_db}{2}. Modifications follow the Prolog \textit{logical update view} semantics, which implies that modifications remain invisible to already running queries. Further isolation can be achieved using \predref{rdf_transaction}{3}. \begin{description} \predicate[det]{rdf_assert}{3}{+Subject, +Predicate, +Object} Assert a new triple into the database. This is equivalent to \predref{rdf_assert}{4} using Graph \const{user}. \arg{Subject} and \arg{Predicate} are resources. \arg{Object} is either a resource or a term \verb$literal(Value)$. See \predref{rdf}{3} for an explanation of Value for typed and language qualified literals. All arguments are subject to name-space expansion. Complete duplicates (including the same graph and `line' and with a compatible `lifespan') are not added to the database. \predicate[det]{rdf_assert}{4}{+Subject, +Predicate, +Object, +Graph} As \predref{rdf_assert}{3}, adding the predicate to the indicated named graph. \begin{arguments} \arg{Graph} & is either the name of a graph (an atom) or a term \arg{Graph}:Line, where Line is an integer that denotes a line number. \\ \end{arguments} \predicate[det]{rdf_retractall}{3}{?Subject, ?Predicate, ?Object} Remove all matching triples from the database. As \predref{rdf_retractall}{4} using an unbound graph. \predicate[det]{rdf_retractall}{4}{?Subject, ?Predicate, ?Object, ?Graph} As \predref{rdf_retractall}{3}, also matching \arg{Graph}. This is particulary useful to remove all triples coming from a loaded file. See also \predref{rdf_unload}{1}. \predicate[det]{rdf_update}{4}{+Subject, +Predicate, +Object, ++Action} \nodescription \predicate[det]{rdf_update}{5}{+Subject, +Predicate, +Object, +Graph, ++Action} Replaces one of the three (four) fields on the matching triples depending on \arg{Action}: \begin{description} \termitem{subject}{Resource} Changes the first field of the triple. \termitem{predicate}{Resource} Changes the second field of the triple. \termitem{object}{Object} Changes the last field of the triple to the given resource or \verb$literal(Value)$. \termitem{graph}{Graph} Moves the triple from its current named graph to \arg{Graph}. This only works with \predref{rdf_update}{5} and throws an error when used with \predref{rdf_update}{4}. \end{description} \end{description} \subsubsection{Update view, transactions and snapshots} \label{sec:semweb-update-view} The update semantics of the RDF database follows the conventional Prolog \textit{logical update view}. In addition, the RDF database supports \textit{transactions} and \textit{snapshots}. \begin{description} \predicate[semidet]{rdf_transaction}{1}{:Goal} Same as \verb$rdf_transaction(Goal, user, [])$. See \predref{rdf_transaction}{3}. \predicate[semidet]{rdf_transaction}{2}{:Goal, +Id} Same as \verb$rdf_transaction(Goal, Id, [])$. See \predref{rdf_transaction}{3}. \predicate[semidet]{rdf_transaction}{3}{:Goal, +Id, +Options} Run \arg{Goal} in an RDF transaction. Compared to the ACID model, RDF transactions have the following properties: \begin{enumerate} \item Modifications inside the transactions become all atomically visible to the outside world if \arg{Goal} succeeds or remain invisible if \arg{Goal} fails or throws an exception. I.e., the \textit{atomicy} property is fully supported. \item \textit{Consistency} is not guaranteed. Later versions may implement consistency constraints that will be checked serialized just before the actual commit of a transaction. \item Concurrently executing transactions do not infuence each other. I.e., the \textit{isolation} property is fully supported. \item \textit{Durability} can be activated by loading \file{library(semweb/rdf_persistency)}. \end{enumerate} Processed options are: \begin{description} \termitem{snapshot}{+Snapshot} Execute \arg{Goal} using the state of the RDF store as stored in \arg{Snapshot}. See \predref{rdf_snapshot}{1}. \arg{Snapshot} can also be the atom \const{true}, which implies that an anonymous snapshot is created at the current state of the store. Modifications due to executing \arg{Goal} are only visible to \arg{Goal}. \end{description} \predicate[det]{rdf_snapshot}{1}{-Snapshot} Take a snapshot of the current state of the RDF store. Later, goals may be executed in the context of the database at this moment using \predref{rdf_transaction}{3} with the \const{snapshot} option. A snapshot created outside a transaction exists until it is deleted. Snapshots taken inside a transaction can only be used inside this transaction. \predicate[det]{rdf_delete_snapshot}{1}{+Snapshot} Delete a snapshot as obtained from \predref{rdf_snapshot}{1}. After this call, resources used for maintaining the snapshot become subject to garbage collection. \predicate[nondet]{rdf_active_transaction}{1}{?Id} True if \arg{Id} is the identifier of a transaction in the context of which this call is executed. If \arg{Id} is not instantiated, backtracking yields transaction identifiers starting with the innermost nested transaction. Transaction identifier terms are not copied, need not be ground and can be instantiated during the transaction. \predicate[nondet]{rdf_current_snapshot}{1}{?Term} True when \arg{Term} is a currently known snapshot. \begin{tags} \tag{bug} Enumeration of snapshots is slow. \end{tags} \end{description} \subsubsection{Type checking predicates} \label{sec:semweb-typecheck} \begin{description} \predicate[semidet]{rdf_is_resource}{1}{@Term} True if \arg{Term} is an RDF resource. Note that this is merely a type-test; it does not mean this resource is involved in any triple. Blank nodes are also considered resources. \begin{tags} \tag{See also} \predref{rdf_is_bnode}{1} \end{tags} \predicate{rdf_is_bnode}{1}{+Id} Tests if a resource is a blank node (i.e. is an anonymous resource). A blank node is represented as an atom that starts with \verb$_:$. For backward compatibility reason, \verb$__$ is also considered to be a blank node. \begin{tags} \tag{See also} \predref{rdf_bnode}{1}. \end{tags} \predicate[semidet]{rdf_is_literal}{1}{@Term} True if \arg{Term} is an RDF literal object. Currently only checks for groundness and the literal functor. \end{description} \subsubsection{Loading and saving to file} \label{sec:semweb-load-save} The RDF library can read and write triples in RDF/XML and a proprietary binary format. There is a plugin interface defined to support additional formats. The \file{library(semweb/turtle)} uses this plugin API to support loading Turtle files using \predref{rdf_load}{2}. \begin{description} \predicate[det]{rdf_load}{1}{+FileOrList} Same as \verb$rdf_load(FileOrList, [])$. See \predref{rdf_load}{2}. \predicate[det]{rdf_load}{2}{+FileOrList, :Options} Load RDF data. \arg{Options} provides additional processing options. Defined options are: \begin{description} \termitem{blank_nodes}{+ShareMode} How to handle equivalent blank nodes. If \const{share} (default), equivalent blank nodes are shared in the same resource. \termitem{base_uri}{+URI} \arg{URI} that is used for rdf:about="" and other RDF constructs that are relative to the base uri. Default is the source URL. \termitem{concurrent}{+Jobs} If \arg{FileOrList} is a list of files, process the input files using \arg{Jobs} threads concurrently. Default is the mininum of the number of cores and the number of inputs. Higher values can be useful when loading inputs from (slow) network connections. Using 1 (one) does not use separate worker threads. \termitem{format}{+Format} Specify the source format explicitly. Normally this is deduced from the filename extension or the mime-type. The core library understands the formats xml (RDF/XML) and triples (internal quick load and cache format). Plugins, such as \file{library(semweb/turtle)} extend the set of recognised extensions. \termitem{graph}{?Graph} Named graph in which to load the data. It is \textbf{not} allowed to load two sources into the same named graph. If \arg{Graph} is unbound, it is unified to the graph into which the data is loaded. The default graph is a \verb$file://$ URL when loading a file or, if the specification is a URL, its normalized version without the optional \textit{\#fragment}. \termitem{if}{Condition} When to load the file. One of \const{true}, \const{changed} (default) or \verb$not_loaded$. \termitem{modified}{-Modified} Unify \arg{Modified} with one of \verb$not_modified$, \verb$cached(File)$, \verb$last_modified(Stamp)$ or \const{unknown}. \termitem{cache}{Bool} If \const{false}, do not use or create a cache file. \termitem{register_namespaces}{Bool} If \const{true} (default \const{false}), register \const{xmlns} namespace declarations or Turtle \verb$@prefix$ prefixes using \predref{rdf_register_prefix}{3} if there is no conflict. \termitem{silent}{+Bool} If \const{true}, the message reporting completion is printed using level \const{silent}. Otherwise the level is \const{informational}. See also \predref{print_message}{2}. \termitem{prefixes}{-Prefixes} Returns the prefixes defined in the source data file as a list of pairs. \prefixtermitem{multifile}{\prefixterm{\Splus}{\arg{Boolean}}} Indicate that the addressed graph may be populated with triples from multiple sources. This disables caching and avoids that an \predref{rdf_load}{2} call affecting the specified graph cleans the graph. \end{description} Other options are forwarded to \predref{process_rdf}{3}. By default, \predref{rdf_load}{2} only loads RDF/XML from files. It can be extended to load data from other formats and locations using plugins. The full set of plugins relevant to support different formats and locations is below: \begin{code} :- use_module(library(semweb/turtle)). % Turtle and TriG :- use_module(library(semweb/rdf_ntriples)). :- use_module(library(semweb/rdf_zlib_plugin)). :- use_module(library(semweb/rdf_http_plugin)). :- use_module(library(http/http_ssl_plugin)). \end{code} \begin{tags} \tag{See also} \qpredref{rdf_db}{rdf_open_hook}{3}, \file{library(semweb/rdf_persistency)} and \file{library(semweb/rdf_cache)} \end{tags} \predicate[det]{rdf_unload}{1}{+Source} Identify the graph loaded from \arg{Source} and use \predref{rdf_unload_graph}{1} to erase this graph. \begin{tags} \tag{deprecated} For compatibility, this predicate also accepts a graph name instead of a source specification. Please update your code to use \predref{rdf_unload_graph}{1}. \end{tags} \predicate[det]{rdf_save}{1}{+Out} Same as \verb$rdf_save(Out, [])$. See \predref{rdf_save}{2} for details. \predicate[det]{rdf_save}{2}{+Out, :Options} Write RDF data as RDF/XML. \arg{Options} is a list of one or more of the following options: \begin{description} \termitem{graph}{+Graph} Save only triples associated to the given named \arg{Graph}. \termitem{anon}{Bool} If \const{false} (default \const{true}) do not save blank nodes that do not appear (indirectly) as object of a named resource. \termitem{base_uri}{URI} BaseURI used. If present, all URIs that can be represented relative to this base are written using their shorthand. See also \verb$write_xml_base$ option. \termitem{convert_typed_literal}{:Convertor} Call \arg{Convertor}(-Type, -Content, +RDFObject), providing the opposite for the convert_typed_literal option of the RDF parser. \termitem{document_language}{+Lang} Initial \verb$xml:lang$ saved with rdf:RDF element. \termitem{encoding}{Encoding} \arg{Encoding} for the output. Either utf8 or iso_latin_1. \termitem{inline}{+Bool} If \const{true} (default \const{false}), inline resources when encountered for the first time. Normally, only bnodes are handled this way. \termitem{namespaces}{+List} Explicitly specify saved namespace declarations. See \predref{rdf_save_header}{2} option namespaces for details. \termitem{sorted}{+Boolean} If \const{true} (default \const{false}), emit subjects sorted on the full URI. Useful to make file comparison easier. \termitem{write_xml_base}{Bool} If \const{false}, do \textit{not} include the \verb$xml:base$ declaration that is written normally when using the \verb$base_uri$ option. \termitem{xml_attributes}{+Bool} If \const{false} (default \const{true}), never use xml attributes to save plain literal attributes, i.e., always used an XML element as in \verb$Joe$. \end{description} \begin{arguments} \arg{Out} & Location to save the data. This can also be a file-url (\verb$file://path$) or a stream wrapped in a term \verb$stream(Out)$. \\ \end{arguments} \begin{tags} \tag{See also} \predref{rdf_save_db}{1} \end{tags} \predicate{rdf_make}{0}{} Reload all loaded files that have been modified since the last time they were loaded. \end{description} \paragraph{Partial save}\label{sec:semweb-partial-save} Sometimes it is necessary to make more arbitrary selections of material to be saved or exchange RDF descriptions over an open network link. The predicates in this section provide for this. Character encoding issues are derived from the encoding of the \textit{Stream}, providing support for \const{utf8}, \verb$iso_latin_1$ and \const{ascii}. \begin{description} \predicate{rdf_save_header}{2}{+Fd, +Options} Save XML document header, doctype and open the RDF environment. This predicate also sets up the namespace notation. Save an RDF header, with the XML header, DOCTYPE, ENTITY and opening the rdf:RDF element with appropriate namespace declarations. It uses the primitives from section 3.5 to generate the required namespaces and desired short-name. \arg{Options} is one of: \begin{description} \termitem{graph}{+URI} Only search for namespaces used in triples that belong to the given named graph. \termitem{namespaces}{+List} Where \arg{List} is a list of namespace abbreviations. With this option, the expensive search for all namespaces that may be used by your data is omitted. The namespaces \const{rdf} and \const{rdfs} are added to the provided \arg{List}. If a namespace is not declared, the resource is emitted in non-abreviated form. \end{description} \predicate[det]{rdf_save_footer}{1}{Out:stream} Finish XML generation and write the document footer. \begin{tags} \tag{See also} \predref{rdf_save_header}{2}, \predref{rdf_save_subject}{3}. \end{tags} \predicate[det]{rdf_save_subject}{3}{+Out, +Subject:resource, +Options} Save the triples associated to \arg{Subject} to \arg{Out}. \arg{Options}: \begin{description} \termitem{graph}{+Graph} Only save properties from \arg{Graph}. \termitem{base_uri}{+URI} \termitem{convert_typed_literal}{:Goal} \termitem{document_language}{+XMLLang} \end{description} \begin{tags} \tag{See also} \predref{rdf_save}{2} for a description of these options. \end{tags} \end{description} \paragraph{Fast loading and saving}\label{sec:semweb-fast-load-save} Loading and saving RDF format is relatively slow. For this reason we designed a binary format that is more compact, avoids the complications of the RDF parser and avoids repetitive lookup of (URL) identifiers. Especially the speed improvement of about 25 times is worth-while when loading large databases. These predicates are used for caching by \predref{rdf_load}{2} under certain conditions as well as for maintaining persistent snapshots of the database using \file{library(semweb/rdf_persistency)}. \begin{description} \predicate[det]{rdf_save_db}{1}{+File} \nodescription \predicate[det]{rdf_save_db}{2}{+File, +Graph} Save triples into \arg{File} in a quick-to-load binary format. If \arg{Graph} is supplied only triples flagged to originate from that database are added. Files created this way can be loaded using \predref{rdf_load_db}{1}. \predicate[det]{rdf_load_db}{1}{+File} Load triples from a file created using \predref{rdf_save_db}{2}. \end{description} \subsubsection{Graph manipulation} \label{sec:semweb-graphs} Many RDF stores turned triples into quadruples. This store is no exception, initially using the 4th argument to store the filename from which the triple was loaded. Currently, the 4th argument is the RDF \textit{named graph}. A named graph maintains some properties, notably to track origin, changes and modified state. \begin{description} \predicate[det]{rdf_create_graph}{1}{+Graph} Create an RDF graph without triples. Succeeds silently if the graph already exists. \predicate[det]{rdf_unload_graph}{1}{+Graph} Remove \arg{Graph} from the RDF store. Succeeds silently if the named graph does not exist. \predicate[nondet]{rdf_graph_property}{2}{?Graph, ?Property} True when \arg{Property} is a property of \arg{Graph}. Defined properties are: \begin{description} \termitem{hash}{Hash} \arg{Hash} is the (MD5-)hash for the content of \arg{Graph}. \termitem{modified}{Boolean} True if the graph is modified since it was loaded or \predref{rdf_set_graph}{2} was called with \verb$modified(false)$. \termitem{source}{Source} The graph is loaded from the \arg{Source} (a URL) \termitem{source_last_modified}{?Time} \arg{Time} is the last-modified timestamp of Source at the moment the graph was loaded from Source. \termitem{triples}{Count} True when \arg{Count} is the number of triples in \arg{Graph}. \end{description} Additional graph properties can be added by defining rules for the multifile predicate \predref{property_of_graph}{2}. Currently, the following extensions are defined: \begin{itemize} \item \file{library(semweb/rdf_persistency)} \begin{description} \termitem{persistent}{Boolean} \arg{Boolean} is \const{true} if the graph is persistent. \end{description} \end{itemize} \predicate[det]{rdf_set_graph}{2}{+Graph, +Property} Set properties of \arg{Graph}. Defined properties are: \begin{description} \termitem{modified}{false} Set the modified state of \arg{Graph} to false. \end{description} \end{description} \subsubsection{Literal matching and indexing} \label{sec:semweb-literals} Literal values are ordered and indexed using a \textit{skip list}. The aim of this index is threefold. \begin{itemize} \item Unlike hash-tables, binary trees allow for efficient \textit{prefix} and \textit{range} matching. Prefix matching is useful in interactive applications to provide feedback while typing such as auto-completion. \item Having a table of unique literals we generate creation and destruction events (see \predref{rdf_monitor}{2}). These events can be used to maintain additional indexing on literals, such as `by word'. See \verb$library(semweb/litindex)$. \end{itemize} As string literal matching is most frequently used for searching purposes, the match is executed case-insensitive and after removal of diacritics. Case matching and diacritics removal is based on Unicode character properties and independent from the current locale. Case conversion is based on the `simple uppercase mapping' defined by Unicode and diacritic removal on the `decomposition type'. The approach is lightweight, but somewhat simpleminded for some languages. The tables are generated for Unicode characters upto 0x7fff. For more information, please check the source-code of the mapping-table generator \verb$unicode_map.pl$ available in the sources of this package. Currently the total order of literals is first based on the type of literal using the ordering \textit{numeric $<$ string $<$ term} Numeric values (integer and float) are ordered by value, integers preceed floats if they represent the same value. Strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase preceeds lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier preceeds literals with a type qualifier which preceeds literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically. The ordered tree is used for indexed execution of \verb$literal(prefix(Prefix), Literal)$ as well as \verb$literal(like(Like), Literal)$ if \textit{Like} does not start with a `*'. Note that results of queries that use the tree index are returned in alphabetical order. \subsubsection{Predicate properties} \label{sec:semweb-predicates} The predicates below form an experimental interface to provide more reasoning inside the kernel of the rdb_db engine. Note that \const{symetric}, \verb$inverse_of$ and \const{transitive} are not yet supported by the rest of the engine. Also note that there is no relation to defined RDF properties. Properties that have no triples are not reported by this predicate, while predicates that are involved in triples do not need to be defined as an instance of rdf:Property. \begin{description} \predicate[det]{rdf_set_predicate}{2}{+Predicate, +Property} Define a property of the predicate. This predicate currently supports the following properties: \begin{description} \termitem{symmetric}{+Boolean} Set/unset the predicate as being symmetric. Using \verb$symmetric(true)$ is the same as \verb$inverse_of(Predicate)$, i.e., creating a predicate that is the inverse of itself. \termitem{transitive}{+Boolean} Sets the transitive property. \termitem{inverse_of}{+Predicate2} Define \arg{Predicate} as the inverse of \arg{Predicate2}. An inverse relation is deleted using \verb$inverse_of([])$. \end{description} The \const{transitive} property is currently not used. The \const{symmetric} and \verb$inverse_of$ properties are considered by \predref{rdf_has}{3},4 and \predref{rdf_reachable}{3}. \begin{tags} \tag{To be done} Maintain these properties based on OWL triples. \end{tags} \predicate{rdf_predicate_property}{2}{?Predicate, ?Property} Query properties of a defined predicate. Currently defined properties are given below. \begin{description} \termitem{symmetric}{Bool} True if the predicate is defined to be symetric. I.e., \{A\} P \{B\} implies \{B\} P \{A\}. Setting symmetric is equivalent to \verb$inverse_of(Self)$. \termitem{inverse_of}{Inverse} True if this predicate is the inverse of \arg{Inverse}. This property is used by \predref{rdf_has}{3}, \predref{rdf_has}{4}, \predref{rdf_reachable}{3} and \predref{rdf_reachable}{5}. \termitem{transitive}{Bool} True if this predicate is transitive. This predicate is currently not used. It might be used to make \predref{rdf_has}{3} imply \predref{rdf_reachable}{3} for transitive predicates. \termitem{triples}{Triples} Unify \arg{Triples} with the number of existing triples using this predicate as second argument. Reporting the number of triples is intended to support query optimization. \termitem{rdf_subject_branch_factor}{-Float} Unify \arg{Float} with the average number of triples associated with each unique value for the subject-side of this relation. If there are no triples the value 0.0 is returned. This value is cached with the predicate and recomputed only after substantial changes to the triple set associated to this relation. This property is intended for path optimalisation when solving conjunctions of \predref{rdf}{3} goals. \termitem{rdf_object_branch_factor}{-Float} Unify \arg{Float} with the average number of triples associated with each unique value for the object-side of this relation. In addition to the comments with the \verb$rdf_subject_branch_factor$ property, uniqueness of the object value is computed from the hash key rather than the actual values. \termitem{rdfs_subject_branch_factor}{-Float} Same as \verb$rdf_subject_branch_factor$, but also considering triples of `subPropertyOf' this relation. See also \predref{rdf_has}{3}. \termitem{rdfs_object_branch_factor}{-Float} Same as \verb$rdf_object_branch_factor$, but also considering triples of `subPropertyOf' this relation. See also \predref{rdf_has}{3}. \end{description} \begin{tags} \tag{See also} \predref{rdf_set_predicate}{2}. \end{tags} \end{description} \subsubsection{Prefix Handling} \label{sec:semweb-prefixes} Prolog code often contains references to constant resources with a known \textit{prefix} (also known as XML \textit{namespaces}). For example, \verb$http://www.w3.org/2000/01/rdf-schema#Class$ refers to the most general notion of an RDFS class. Readability and maintability concerns require for abstraction here. The RDF database maintains a table of known \textit{prefixes}. This table can be queried using \predref{rdf_current_ns}{2} and can be extended using \predref{rdf_register_ns}{3}. The prefix database is used to expand \verb$prefix:local$ terms that appear as arguments to calls which are known to accept a \textit{resource}. This expansion is achieved by Prolog preprocessor using \predref{expand_goal}{2}. \begin{description} \predicate[nondet]{rdf_current_prefix}{2}{:Alias, ?URI} Query predefined prefixes and prefixes defined with \predref{rdf_register_prefix}{2} and local prefixes defined with \predref{rdf_prefix}{2}. If \arg{Alias} is unbound and one \arg{URI} is the prefix of another, the longest is returned first. This allows turning a resource into a prefix/local couple using the simple enumeration below. See \predref{rdf_global_id}{2}. \begin{code} rdf_current_prefix(Prefix, Expansion), atom_concat(Expansion, Local, URI), \end{code} \predicate[det]{rdf_register_prefix}{2}{+Prefix, +URI} \nodescription \predicate[det]{rdf_register_prefix}{3}{+Prefix, +URI, +Options} Register \arg{Prefix} as an abbreviation for \arg{URI}. \arg{Options}: \begin{description} \termitem{force}{Boolean} If \const{true}, replace existing namespace alias. Please note that replacing a namespace is dangerous as namespaces affect preprocessing. Make sure all code that depends on a namespace is compiled after changing the registration. \termitem{keep}{Boolean} If \const{true} and Alias is already defined, keep the original binding for \arg{Prefix} and succeed silently. \end{description} Without options, an attempt to redefine an alias raises a permission error. Predefined prefixes are: \begin{quote} \begin{tabulary}{0.9\textwidth}{|l|l|} \hline \textbf{Alias} & \textbf{IRI prefix} \\ dc & \url{http://purl.org/dc/elements/1.1/} \\ dcterms & \url{http://purl.org/dc/terms/} \\ eor & \url{http://dublincore.org/2000/03/13/eor\#} \\ foaf & \url{http://xmlns.com/foaf/0.1/} \\ owl & \url{http://www.w3.org/2002/07/owl\#} \\ rdf & \url{http://www.w3.org/1999/02/22-rdf-syntax-ns\#} \\ rdfs & \url{http://www.w3.org/2000/01/rdf-schema\#} \\ serql & \url{http://www.openrdf.org/schema/serql\#} \\ skos & \url{http://www.w3.org/2004/02/skos/core\#} \\ void & \url{http://rdfs.org/ns/void\#} \\ xsd & \url{http://www.w3.org/2001/XMLSchema\#} \\ \hline \end{tabulary} \end{quote} \end{description} \textit{Explicit} expansion is achieved using the predicates below. The predicate \predref{rdf_equal}{2} performs this expansion at compile time, while the other predicates do it at runtime. \begin{description} \predicate{rdf_equal}{2}{?Resource1, ?Resource2} Simple equality test to exploit goal-expansion. \predicate[semidet]{rdf_global_id}{2}{?IRISpec, :IRI} Convert between Prefix:Local and full \arg{IRI} (an atom). If \arg{IRISpec} is an atom, it is simply unified with \arg{IRI}. This predicate fails silently if \arg{IRI} is an RDF literal. Note that this predicate is a meta-predicate on its output argument. This is necessary to get the module context while the first argument may be of the form (:)/2. The above mode description is correct, but should be interpreted as (?,?). \begin{tags} \tag{Errors} \verb$existence_error(rdf_prefix, Prefix)$\mtag{See also}- \predref{rdf_equal}{2} provides a compile time alternative \\- The \predref{rdf_meta}{1} directive asks for compile time expansion of arguments. \tag{bug} Error handling is incomplete. In its current implementation the same code is used for compile-time expansion and to facilitate runtime conversion and checking. These use cases have different requirements. \end{tags} \predicate[semidet]{rdf_global_object}{2}{+Object, :GlobalObject} \nodescription \predicate[semidet]{rdf_global_object}{2}{-Object, :GlobalObject} Same as \predref{rdf_global_id}{2}, but intended for dealing with the object part of a triple, in particular the type for typed literals. Note that the predicate is a meta-predicate on the output argument. This is necessary to get the module context while the first argument may be of the form (:)/2. \begin{tags} \tag{Errors} \verb$existence_error(rdf_prefix, Prefix)$ \end{tags} \predicate[det]{rdf_global_term}{2}{+TermIn, :GlobalTerm} Performs \predref{rdf_global_id}{2} on predixed IRIs and \predref{rdf_global_object}{2} on RDF literals, by recursively analysing the term. Note that the predicate is a meta-predicate on the output argument. This is necessary to get the module context while the first argument may be of the form (:)/2. Terms of the form \verb$Prefix:Local$ that appear in \arg{TermIn} for which \arg{Prefix} is not defined are not replaced. Unlike \predref{rdf_global_id}{2} and \predref{rdf_global_object}{2}, no error is raised. \end{description} \paragraph{Namespace handling for custom predicates}\label{sec:semweb-meta} If we implement a new predicate based on one of the predicates of the semweb libraries that expands namespaces, namespace expansion is not automatically available to it. Consider the following code computing the number of distinct objects for a certain property on a certain object. \begin{code} cardinality(S, P, C) :- ( setof(O, rdf_has(S, P, O), Os) -> length(Os, C) ; C = 0 ). \end{code} Now assume we want to write \predref{labels}{2} that returns the number of distict labels of a resource: \begin{code} labels(S, C) :- cardinality(S, rdfs:label, C). \end{code} This code will \textit{not} work because \verb$rdfs:label$ is not expanded at compile time. To make this work, we need to add an \predref{rdf_meta}{1} declaration. \begin{code} :- rdf_meta cardinality(r,r,-). \end{code} \begin{shortlist} \item [[\predref{rdf_meta}{1}]] \end{shortlist} The example below defines the rule \predref{concept}{1}. \begin{code} :- use_module(library(semweb/rdf_db)). % for rdf_meta :- use_module(library(semweb/rdfs)). % for rdfs_individual_of :- rdf_meta concept(r). %% concept(?C) is nondet. % % True if C is a concept. concept(C) :- rdfs_individual_of(C, skos:'Concept'). \end{code} In addition to expanding \textit{calls}, \predref{rdf_meta}{1} also causes expansion of \textit{clause heads} for predicates that match a declaration. This is typically used write Prolog statements about resources. The following example produces three clauses with expanded (single-atom) arguments: \begin{code} :- use_module(library(semweb/rdf_db)). :- rdf_meta label_predicate(r). label_predicate(rdfs:label). label_predicate(skos:prefLabel). label_predicate(skos:altLabel). \end{code} \subsubsection{Miscellaneous predicates} \label{sec:semweb-misc} This section describes the remaining predicates of the \file{library(semweb/rdf_db)} module. \begin{description} \predicate{rdf_bnode}{1}{-Id} Generate a unique anonymous identifier for a subject. \predicate[nondet]{rdf_source_location}{2}{+Subject, -Location} True when triples for \arg{Subject} are loaded from \arg{Location}. \begin{arguments} \arg{Location} & is a term File:Line. \\ \end{arguments} \predicate[det]{rdf_generation}{1}{-Generation} True when \arg{Generation} is the current generation of the database. Each modification to the database increments the generation. It can be used to check the validity of cached results deduced from the database. Committing a non-empty transaction increments the generation by one. When inside a transaction, \arg{Generation} is unified to a term \textit{TransactionStartGen} + \textit{InsideTransactionGen}. E.g., 4+3 means that the transaction was started at generation 4 of the global database and we have created 3 new generations inside the transaction. Note that this choice of representation allows for comparing generations using Prolog arithmetic. Comparing a generation in one transaction with a generation in another transaction is meaningless. \predicate{rdf_estimate_complexity}{4}{?Subject, ?Predicate, ?Object, -Complexity} Return the number of alternatives as indicated by the database internal hashed indexing. This is a rough measure for the number of alternatives we can expect for an \predref{rdf_has}{3} call using the given three arguments. When called with three variables, the total number of triples is returned. This estimate is used in query optimisation. See also \predref{rdf_predicate_property}{2} and \predref{rdf_statistics}{1} for additional information to help optimizers. \predicate[nondet]{rdf_statistics}{1}{?KeyValue} Obtain statistics on the RDF database. Defined statistics are: \begin{description} \termitem{graphs}{-Count} Number of named graphs. \termitem{triples}{-Count} Total number of triples in the database. This is the number of asserted triples minus the number of retracted ones. The number of \textit{visible} triples in a particular context may be different due to visibility rules defined by the logical update view and transaction isolation. \termitem{resources}{-Count} Number of resources that appear as subject or object in a triple. See \predref{rdf_resource}{1}. \termitem{properties}{-Count} Number of current predicates. See \predref{rdf_current_predicate}{1}. \termitem{literals}{-Count} Number of current literals. See \predref{rdf_current_literal}{1}. \termitem{gc}{GCCount, ReclaimedTriples, ReindexedTriples, Time} Information about the garbage collector. \termitem{searched_nodes}{-Count} Number of nodes expanded by \predref{rdf_reachable}{3} and \predref{rdf_reachable}{5}. \termitem{lookup}{rdf(S,P,O,G), Count} Number of queries that have been performed for this particular instantiation pattern. Each of \arg{S},\arg{P},\arg{O},\arg{G} is either + or -. Fails in case the number of performed queries is zero. \termitem{hash_quality}{rdf(S,P,O,G), Buckets, Quality, PendingResize} Statistics on the index for this pattern. Indices are created lazily on the first relevant query. \termitem{triples_by_graph}{Graph, Count} This statistics is produced for each named graph. See \const{triples} for the interpretation of this value. \end{description} \predicate[semidet]{rdf_match_label}{3}{+How, +Pattern, +Label} True if \arg{Label} matches \arg{Pattern} according to \arg{How}. \arg{How} is one of \const{icase}, \const{substring}, \const{word}, \const{prefix} or \const{like}. For backward compatibility, \const{exact} is a synonym for \const{icase}. \predicate[semidet]{lang_matches}{2}{+Lang, +Pattern} True if \arg{Lang} matches \arg{Pattern}. This implements XML language matching conform RFC 4647. Both \arg{Lang} and \arg{Pattern} are dash-separated strings of identifiers or (for \arg{Pattern}) the wildcard *. Identifiers are matched case-insensitive and a * matches any number of identifiers. A short pattern is the same as *. \predicate[semidet]{lang_equal}{2}{+Lang1, +Lang2} True if two RFC language specifiers denote the same language \begin{tags} \tag{See also} \predref{lang_matches}{2}. \end{tags} \predicate{rdf_reset_db}{0}{} Remove all triples from the RDF database and reset all its statistics. \begin{tags} \tag{bug} This predicate checks for active queries, but this check is not properly synchronized and therefore the use of this predicate is unsafe in multi-threaded contexts. It is mainly used to run functionality tests that need to start with an empty database. \end{tags} \predicate[det]{rdf_version}{1}{-Version} True when \arg{Version} is the numerical version-id of this library. The version is computed as \begin{code} Major*10000 + Minor*100 + Patch. \end{code} \end{description} \subsubsection{Memory management considerations} \label{sec:semweb-memory-management} Storing RDF triples in main memory provides much better performance than using external databases. Unfortunately, although memory is fairly cheap these days, main memory is severely limited when compared to disks. Memory usage breaks down to the following categories. Rough estimates of the memory usage is given \textbf{for 64-bit systems}. 32-bit system use slightly more than half these amounts. \begin{itemize} \item Actually storing the triples. A triple is stored in a C struct of 144 bytes. This struct both holds the quintuple, some bookkeeping information and the 10 next-pointers for the (max) to hash tables. \item The bucket array for the hashes. Each bucket maintains a \textit{head}, and \textit{tail} pointer, as well as a count for the number of entries. The bucket array is allocated if a particular index is created, which implies the first query that requires the index. Each bucket requires 24 bytes. Bucket arrays are resized if necessary. Old triples remain at their original location. This implies that a query may need to scan multiple buckets. The garbage collector may relocate old indexed triples. It does so by copying the old triple. The old triple is later reclaimed by GC. Reindexed triples will be reused, but many reindexed triples may result in a significant memory fragmentation. \item Resources are maintained in a separate table to support \predref{rdf_resource}{1}. A resources requires approximately 32 bytes. \item Identical literals are shared (see \predref{rdf_current_literal}{1}) and stored in a \textit{skip list}. A literal requires approximately 40 bytes, excluding the atom used for the lexical representation. \item Resources are stored in the Prolog atom-table. Atoms with the average length of a resource require approximately 88 bytes. \end{itemize} The hash parameters can be controlled with \predref{rdf_set}{1}. Applications that are tight on memory and for which the query characteristics are more or less known can optimize performance and memory by fixing the hash-tables. By fixing the hash-tables we can tailor them to the frequent query patterns, we avoid the need for to check multiple hash buckets (see above) and we avoid memory fragmentation due to optimizing triples for resized hashes. \begin{code} set_hash_parameters :- rdf_set(hash(s, size, 1048576)), rdf_set(hash(p, size, 1024)), rdf_set(hash(sp, size, 2097152)), rdf_set(hash(o, size, 1048576)), rdf_set(hash(po, size, 2097152)), rdf_set(hash(spo, size, 2097152)), rdf_set(hash(g, size, 1024)), rdf_set(hash(sg, size, 1048576)), rdf_set(hash(pg, size, 2048)). \end{code} \begin{description} \predicate[det]{rdf_set}{1}{+Term} Set properties of the RDF store. Currently defines: \begin{description} \termitem{hash}{+Hash, +Parameter, +Value} Set properties for a triple index. \arg{Hash} is one of \const{s}, \const{p}, \const{sp}, \const{o}, \const{po}, \const{spo}, \const{g}, \const{sg} or \const{pg}. \arg{Parameter} is one of: \begin{description} \termitem{size}{} \arg{Value} defines the number of entries in the hash-table. \arg{Value} is rounded \textit{down} to a power of 2. After setting the size explicitly, auto-sizing for this table is disabled. Setting the size smaller than the current size results in a \verb$permission_error$ exception. \termitem{average_chain_len}{} Set maximum average collision number for the hash. \termitem{optimize_threshold}{} Related to resizing hash-tables. If 0, all triples are moved to the new size by the garbage collector. If more then zero, those of the last \arg{Value} resize steps remain at their current location. Leaving cells at their current location reduces memory fragmentation and slows down access. \end{description} \end{description} \end{description} \paragraph{The garbage collector}\label{sec:semweb-gc} The RDF store has a garbage collector that runs in a separate thread named =__rdf_GC=. The garbage collector removes the following objects: \begin{itemize} \item Triples that have died before the the generation of last still active query. \item Entailment matrices for \verb$rdfs:subPropertyOf$ relations that are related to old queries. \end{itemize} In addition, the garbage collector reindexes triples associated to the hash-tables before the table was resized. The most recent resize operation leads to the largest number of triples that require reindexing, while the oldest resize operation causes the largest slowdown. The parameter \verb$optimize_threshold$ controlled by \predref{rdf_set}{1} can be used to determine the number of most recent resize operations for which triples will not be reindexed. The default is 2. Normally, the garbage collector does it job in the background at a low priority. The predicate \predref{rdf_gc}{0} can be used to reclaim all garbage and optimize all indexes.\paragraph{Warming up the database}\label{sec:semweb-warming-up} The RDF store performs many operations lazily or in background threads. For maximum performance, perform the following steps: \begin{itemize} \item Load all the data without doing queries or retracting data in between. This avoids creating the indexes and therefore the need to resize them. \item Perform each of the indexed queries. The following call performs this. Note that it is irrelevant whether or not the query succeeds. \begin{code} warm_indexes :- ignore(rdf(s, _, _)), ignore(rdf(_, p, _)), ignore(rdf(_, _, o)), ignore(rdf(s, p, _)), ignore(rdf(_, p, o)), ignore(rdf(s, p, o)), ignore(rdf(_, _, _, g)), ignore(rdf(s, _, _, g)), ignore(rdf(_, p, _, g)). \end{code} \item Duplicate adminstration is initialized in the background after the first call that returns a significant amount of duplicates. Creating the adminstration can be forced by calling \predref{rdf_update_duplicates}{0}. \end{itemize} Predicates: \begin{description} \predicate[det]{rdf_gc}{0}{} Run the RDF-DB garbage collector until no garbage is left and all tables are fully optimized. Under normal operation a separate thread with identifier \verb$__rdf_GC$ performs garbage collection as long as it is considered `useful'. Using \predref{rdf_gc}{0} should only be needed to ensure a fully clean database for analysis purposes such as leak detection. \predicate[det]{rdf_update_duplicates}{0}{} Update the duplicate administration of the RDF store. This marks every triple that is potentionally a duplicate of another as duplicate. Being potentially a duplicate means that subject, predicate and object are equivalent and the life-times of the two triples overlap. The duplicates marks are used to reduce the administrative load of avoiding duplicate answers. Normally, the duplicates are marked using a background thread that is started on the first query that produces a substantial amount of duplicates. \end{description}