% This LaTeX document was generated using the LaTeX backend of PlDoc,
% The SWI-Prolog documentation system


The central module of the RDF infrastructure is \file{library(semweb/rdf_db)}.
It provides storage and indexed querying of RDF triples. RDF data is
stored as quintuples. The first three elements denote the RDF triple.
The extra \textit{Graph} and \textit{Line} elements provide information about the
origin of the triple.

The actual storage is provided by the \textit{foreign language (C)} module.
Using a dedicated C-based implementation we can reduce memory usage and
improve indexing capabilities, for example by providing a dedicated
index to support entailment over \verb$rdfs:subPropertyOf$. Currently the
following indexes are provided (S=subject, P=predicate, O=object,
G=graph):

\begin{itemize}
    \item S, P, O, SP, PO, SPO, G, SG, PG
    \item Predicates connected by \textbf{rdfs:subPropertyOf} are combined
in a \textit{predicate cloud}. The system causes multiple
predicates in the cloud to share the same hash. The cloud
maintains a 2-dimensional array that expresses the
closure of all \verb$rdfs:subPropertyOf$ relations. This
index supports \predref{rdf_has}{3} to query a property and all its
children efficiently.
    \item Additional indexes for predicates, resources and graphs allow
enumerating these objects without duplicates. For example,
using \predref{rdf_resource}{1} we enumerate all resources in the database
only once, while enumeration using e.g., \verb$(rdf(R,_,_);rdf(_,_,R))$
normally produces many duplicate answers.
    \item Literal \textit{Objects} are combined in a \textit{skip list} after case
normalization. This provides for efficient case-insensitive search,
prefix and range search. The plugin library \verb$library(semweb/litindex)$
provides indexed search on tokens inside literals.
\end{itemize}

\subsubsection{Query the RDF database}

\label{sec:semweb-query}

\begin{description}
    \predicate[nondet]{rdf}{3}{?Subject, ?Predicate, ?Object}
Elementary query for triples. \arg{Subject} and \arg{Predicate} are atoms
representing the fully qualified URL of the resource. \arg{Object} is
either an atom representing a resource or \verb$literal(Value)$ if the
object is a literal value. If a value of the form
NameSpaceID:LocalName is provided it is expanded to a ground
atom using \predref{expand_goal}{2}. This implies you can use this
construct in compiled code without paying a performance penalty.
Literal values take one of the following forms:

\begin{description}
    \termitem{\arg{Atom}}{}
If the value is a simple atom it is the textual representation
of a string literal without explicit type or language
qualifier.
    \termitem{lang}{LangID, Atom}
\arg{Atom} represents the text of a string literal qualified with
the given language.
    \termitem{type}{TypeID, Value}
Used for attributes qualified using the \verb$rdf:datatype$
\arg{TypeID}. The \arg{Value} is either the textual representation or a
natural Prolog representation. See the option
convert_typed_literal(:Convertor) of the parser. The storage
layer provides efficient handling of atoms, integers (64-bit)
and floats (native C-doubles). All other data is represented
as a Prolog record.
\end{description}

For literal querying purposes, \arg{Object} can be of the form
\verb$literal(+Query, -Value)$, where Query is one of the terms below.
If the Query takes a literal argument and the value has a
numeric type numerical comparison is performed.

\begin{description}
    \termitem{plain}{+Text}
Perform exact match and demand the language or type qualifiers
to match. This query is fully indexed.
    \termitem{icase}{+Text}
Perform a full but case-insensitive match. This query is
fully indexed.
    \termitem{exact}{+Text}
Same as \verb$icase(Text)$. Backward compatibility.
    \termitem{substring}{+Text}
Match any literal that contains \arg{Text} as a case-insensitive
substring. The query is not indexed on \arg{Object}.
    \termitem{word}{+Text}
Match any literal that contains \arg{Text} delimited by a non
alpha-numeric character, the start or end of the string. The
query is not indexed on \arg{Object}.
    \termitem{prefix}{+Text}
Match any literal that starts with \arg{Text}. This call is intended
for completion. The query is indexed using the skip list of
literals.
    \termitem{ge}{+Literal}
Match any literal that is equal or larger than \arg{Literal} in the
ordered set of literals.
    \termitem{gt}{+Literal}
Match any literal that is larger than \arg{Literal} in the ordered set
of literals.
    \termitem{eq}{+Literal}
Match any literal that is equal to \arg{Literal} in the ordered set
of literals.
    \termitem{le}{+Literal}
Match any literal that is equal or smaller than \arg{Literal} in the
ordered set of literals.
    \termitem{lt}{+Literal}
Match any literal that is smaller than \arg{Literal} in the ordered set
of literals.
    \termitem{between}{+Literal1, +Literal2}
Match any literal that is between \arg{Literal1} and \arg{Literal2} in the
ordered set of literals. This may include both \arg{Literal1} and
\arg{Literal2}.
    \termitem{like}{+Pattern}
Match any literal that matches \arg{Pattern} case insensitively,
where the `*' character in \arg{Pattern} matches zero or more
characters.
\end{description}

Backtracking never returns duplicate triples. Duplicates can be
retrieved using \predref{rdf}{4}. The predicate \predref{rdf}{3} raises a type-error
if called with improper arguments. If \predref{rdf}{3} is called with a
term \verb$literal(_)$ as \arg{Subject} or \arg{Predicate} object it fails
silently. This allows for graph matching goals like
\verb$rdf(S,P,O)$,\verb$rdf(O,P2,O2)$ to proceed without errors.

    \predicate[nondet]{rdf}{4}{?Subject, ?Predicate, ?Object, ?Source}
As \predref{rdf}{3} but in addition query the graph to which the triple
belongs. Unlike \predref{rdf}{3}, this predicate does not remove duplicates
from the result set.

\begin{arguments}
\arg{Source} & is a term Graph:Line. If \arg{Source} is instatiated,
passing an atom is the same as passing Atom:_. \\
\end{arguments}

    \predicate[nondet]{rdf_has}{3}{?Subject, +Predicate, ?Object}
Succeeds if the triple \verb$rdf(Subject, Predicate, Object)$ is true
exploiting the rdfs:subPropertyOf predicate as well as inverse
predicates declared using \predref{rdf_set_predicate}{2} with the
\verb$inverse_of$ property.

    \predicate[nondet]{rdf_has}{4}{?Subject, +Predicate, ?Object, -RealPredicate}
Same as \predref{rdf_has}{3}, but \arg{RealPredicate} is unified to the actual
predicate that makes this relation true. \arg{RealPredicate} must be
\arg{Predicate} or an rdfs:subPropertyOf \arg{Predicate}. If an inverse
match is found, \arg{RealPredicate} is the term \verb$inverse_of(Pred)$.

    \predicate[nondet]{rdf_reachable}{3}{?Subject, +Predicate, ?Object}
Is true if \arg{Object} can be reached from \arg{Subject} following the
transitive predicate \arg{Predicate} or a sub-property thereof, while
repecting the \verb$symetric(true)$ or \verb$inverse_of(P2)$ properties.

If used with either \arg{Subject} or \arg{Object} unbound, it first returns
the origin, followed by the reachable nodes in breadth-first
search-order. The implementation internally looks one solution
ahead and succeeds deterministically on the last solution. This
predicate never generates the same node twice and is robust
against cycles in the transitive relation.

With all arguments instantiated, it succeeds deterministically
if a path can be found from \arg{Subject} to \arg{Object}. Searching starts
at \arg{Subject}, assuming the branching factor is normally lower. A
call with both \arg{Subject} and \arg{Object} unbound raises an
instantiation error. The following example generates all
subclasses of rdfs:Resource:

\begin{code}
?- rdf_reachable(X, rdfs:subClassOf, rdfs:'Resource').
X = 'http://www.w3.org/2000/01/rdf-schema#Resource' ;
X = 'http://www.w3.org/2000/01/rdf-schema#Class' ;
X = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property' ;
...
\end{code}

    \predicate[nondet]{rdf_reachable}{5}{?Subject, +Predicate, ?Object, +MaxD, -D}
Same as \predref{rdf_reachable}{3}, but in addition, \arg{MaxD} limits the number
of edges expanded and \arg{D} is unified with the `distance' between
\arg{Subject} and \arg{Object}. Distance 0 means \arg{Subject} and \arg{Object} are the
same resource. \arg{MaxD} can be the constant \const{infinite} to impose no
distance-limit.
\end{description}

\subsubsection{Enumerating objects}

\label{sec:semweb-enumerate}

The predicates below enumerate the basic objects of the RDF store. Most
of these predicates also enumerate objects that are not associated to
any currently visible triple. Objects are retained as long as they are
visible in active queries or \textit{snapshots}. After that, some are reclaimed
by the RDF garbage collector, while others are never reclaimed.

\begin{description}
    \predicate[nondet]{rdf_subject}{1}{?Resource}
True if \arg{Resource} appears as a subject. This query respects the
visibility rules implied by the logical update view.

\begin{tags}
    \tag{See also}
\predref{rdf_resource}{1}.
\end{tags}

    \predicate[nondet]{rdf_resource}{1}{?Resource}
True when \arg{Resource} is a resource used as a subject or object in
a triple.

This predicate is primarily intended as a way to process all
resources without processing resources twice. The user must be
aware that some of the returned resources may not appear in any
\textit{visible} triple.

    \predicate[nondet]{rdf_current_predicate}{1}{?Predicate}
True when \arg{Predicate} is a currently known predicate. Predicates
are created if a triples is created that uses this predicate or
a property of the predicate is set using \predref{rdf_set_predicate}{2}.
The predicate may (no longer) have triples associated with it.

Note that resources that have \verb$rdf:type$ \verb$rdf:Property$ are
not automatically included in the result-set of this predicate,
while \textit{all} resources that appear as the second argument of a
triple \textit{are} included.

\begin{tags}
    \tag{See also}
\predref{rdf_predicate_property}{2}.
\end{tags}

    \predicate[nondet]{rdf_current_literal}{1}{-Literal}
True when \arg{Literal} is a currently known literal. Enumerates each
unique literal exactly once. Note that it is possible that the
literal only appears in already deleted triples. Deleted triples
may be locked due to active queries, transactions or snapshots
or may not yet be reclaimed by the garbage collector.

    \predicate[nondet]{rdf_graph}{1}{?Graph}
True when \arg{Graph} is an existing graph.

    \predicate[nondet]{rdf_current_ns}{2}{:Prefix, ?URI}

\begin{tags}
    \tag{deprecated}
Use \predref{rdf_current_prefix}{2}.
\end{tags}
\end{description}

\subsubsection{Modifying the RDF database}

\label{sec:semweb-modify}

The predicates below modify the RDF store directly. In addition, data
may be loaded using \predref{rdf_load}{2} or by restoring a persistent database
using \predref{rdf_attach_db}{2}. Modifications follow the Prolog \textit{logical update
view} semantics, which implies that modifications remain invisible to
already running queries. Further isolation can be achieved using
\predref{rdf_transaction}{3}.

\begin{description}
    \predicate[det]{rdf_assert}{3}{+Subject, +Predicate, +Object}
Assert a new triple into the database. This is equivalent to
\predref{rdf_assert}{4} using Graph \const{user}. \arg{Subject} and \arg{Predicate} are
resources. \arg{Object} is either a resource or a term \verb$literal(Value)$.
See \predref{rdf}{3} for an explanation of Value for typed and language
qualified literals. All arguments are subject to name-space
expansion. Complete duplicates (including the same graph and
`line' and with a compatible `lifespan') are not added to the
database.

    \predicate[det]{rdf_assert}{4}{+Subject, +Predicate, +Object, +Graph}
As \predref{rdf_assert}{3}, adding the predicate to the indicated named
graph.

\begin{arguments}
\arg{Graph} & is either the name of a graph (an atom) or a term
\arg{Graph}:Line, where Line is an integer that denotes a line number. \\
\end{arguments}

    \predicate[det]{rdf_retractall}{3}{?Subject, ?Predicate, ?Object}
Remove all matching triples from the database. As
\predref{rdf_retractall}{4} using an unbound graph.

    \predicate[det]{rdf_retractall}{4}{?Subject, ?Predicate, ?Object, ?Graph}
As \predref{rdf_retractall}{3}, also matching \arg{Graph}. This is particulary
useful to remove all triples coming from a loaded file. See also
\predref{rdf_unload}{1}.

    \predicate[det]{rdf_update}{4}{+Subject, +Predicate, +Object, ++Action}
\nodescription
    \predicate[det]{rdf_update}{5}{+Subject, +Predicate, +Object, +Graph, ++Action}
Replaces one of the three (four) fields on the matching triples
depending on \arg{Action}:

\begin{description}
    \termitem{subject}{Resource}
Changes the first field of the triple.
    \termitem{predicate}{Resource}
Changes the second field of the triple.
    \termitem{object}{Object}
Changes the last field of the triple to the given resource or
\verb$literal(Value)$.
    \termitem{graph}{Graph}
Moves the triple from its current named graph to \arg{Graph}.
This only works with \predref{rdf_update}{5} and throws an error when
used with \predref{rdf_update}{4}.
\end{description}
\end{description}

\subsubsection{Update view, transactions and snapshots}

\label{sec:semweb-update-view}

The update semantics of the RDF database follows the conventional Prolog
\textit{logical update view}. In addition, the RDF database supports
\textit{transactions} and \textit{snapshots}.

\begin{description}
    \predicate[semidet]{rdf_transaction}{1}{:Goal}
Same as \verb$rdf_transaction(Goal, user, [])$. See \predref{rdf_transaction}{3}.

    \predicate[semidet]{rdf_transaction}{2}{:Goal, +Id}
Same as \verb$rdf_transaction(Goal, Id, [])$. See \predref{rdf_transaction}{3}.

    \predicate[semidet]{rdf_transaction}{3}{:Goal, +Id, +Options}
Run \arg{Goal} in an RDF transaction. Compared to the ACID model,
RDF transactions have the following properties:

\begin{enumerate}
    \item Modifications inside the transactions become all atomically
visible to the outside world if \arg{Goal} succeeds or remain
invisible if \arg{Goal} fails or throws an exception. I.e.,
the \textit{atomicy} property is fully supported.
    \item \textit{Consistency} is not guaranteed. Later versions may
implement consistency constraints that will be checked
serialized just before the actual commit of a transaction.
    \item Concurrently executing transactions do not infuence each
other. I.e., the \textit{isolation} property is fully supported.
    \item \textit{Durability} can be activated by loading
\file{library(semweb/rdf_persistency)}.
\end{enumerate}

Processed options are:

\begin{description}
    \termitem{snapshot}{+Snapshot}
Execute \arg{Goal} using the state of the RDF store as stored in
\arg{Snapshot}. See \predref{rdf_snapshot}{1}. \arg{Snapshot} can also be the
atom \const{true}, which implies that an anonymous snapshot is
created at the current state of the store. Modifications
due to executing \arg{Goal} are only visible to \arg{Goal}.
\end{description}

    \predicate[det]{rdf_snapshot}{1}{-Snapshot}
Take a snapshot of the current state of the RDF store. Later,
goals may be executed in the context of the database at this
moment using \predref{rdf_transaction}{3} with the \const{snapshot} option. A
snapshot created outside a transaction exists until it is
deleted. Snapshots taken inside a transaction can only be used
inside this transaction.

    \predicate[det]{rdf_delete_snapshot}{1}{+Snapshot}
Delete a snapshot as obtained from \predref{rdf_snapshot}{1}. After this
call, resources used for maintaining the snapshot become subject
to garbage collection.

    \predicate[nondet]{rdf_active_transaction}{1}{?Id}
True if \arg{Id} is the identifier of a transaction in the context of
which this call is executed. If \arg{Id} is not instantiated,
backtracking yields transaction identifiers starting with the
innermost nested transaction. Transaction identifier terms are
not copied, need not be ground and can be instantiated during
the transaction.

    \predicate[nondet]{rdf_current_snapshot}{1}{?Term}
True when \arg{Term} is a currently known snapshot.

\begin{tags}
    \tag{bug}
Enumeration of snapshots is slow.
\end{tags}
\end{description}

\subsubsection{Type checking predicates}

\label{sec:semweb-typecheck}

\begin{description}
    \predicate[semidet]{rdf_is_resource}{1}{@Term}
True if \arg{Term} is an RDF resource. Note that this is merely a
type-test; it does not mean this resource is involved in any
triple. Blank nodes are also considered resources.

\begin{tags}
    \tag{See also}
\predref{rdf_is_bnode}{1}
\end{tags}

    \predicate{rdf_is_bnode}{1}{+Id}
Tests if a resource is a blank node (i.e. is an anonymous
resource). A blank node is represented as an atom that starts
with \verb$_:$. For backward compatibility reason, \verb$__$ is also
considered to be a blank node.

\begin{tags}
    \tag{See also}
\predref{rdf_bnode}{1}.
\end{tags}

    \predicate[semidet]{rdf_is_literal}{1}{@Term}
True if \arg{Term} is an RDF literal object. Currently only checks for
groundness and the literal functor.
\end{description}

\subsubsection{Loading and saving to file}

\label{sec:semweb-load-save}

The RDF library can read and write triples in RDF/XML and a proprietary
binary format. There is a plugin interface defined to support additional
formats. The \file{library(semweb/turtle)} uses this plugin API to support
loading Turtle files using \predref{rdf_load}{2}.

\begin{description}
    \predicate[det]{rdf_load}{1}{+FileOrList}
Same as \verb$rdf_load(FileOrList, [])$. See \predref{rdf_load}{2}.

    \predicate[det]{rdf_load}{2}{+FileOrList, :Options}
Load RDF data. \arg{Options} provides additional processing options.
Defined options are:

\begin{description}
    \termitem{blank_nodes}{+ShareMode}
How to handle equivalent blank nodes. If \const{share} (default),
equivalent blank nodes are shared in the same resource.
    \termitem{base_uri}{+URI}
\arg{URI} that is used for rdf:about="" and other RDF constructs
that are relative to the base uri. Default is the source
URL.
    \termitem{concurrent}{+Jobs}
If \arg{FileOrList} is a list of files, process the input files
using \arg{Jobs} threads concurrently. Default is the mininum
of the number of cores and the number of inputs. Higher
values can be useful when loading inputs from (slow)
network connections. Using 1 (one) does not use
separate worker threads.
    \termitem{format}{+Format}
Specify the source format explicitly. Normally this is
deduced from the filename extension or the mime-type. The
core library understands the formats xml (RDF/XML) and
triples (internal quick load and cache format). Plugins,
such as \file{library(semweb/turtle)} extend the set of recognised
extensions.
    \termitem{graph}{?Graph}
Named graph in which to load the data. It is \textbf{not} allowed
to load two sources into the same named graph. If \arg{Graph} is
unbound, it is unified to the graph into which the data is
loaded. The default graph is a \verb$file://$ URL when loading
a file or, if the specification is a URL, its normalized
version without the optional \textit{\#fragment}.
    \termitem{if}{Condition}
When to load the file. One of \const{true}, \const{changed} (default) or
\verb$not_loaded$.
    \termitem{modified}{-Modified}
Unify \arg{Modified} with one of \verb$not_modified$, \verb$cached(File)$,
\verb$last_modified(Stamp)$ or \const{unknown}.
    \termitem{cache}{Bool}
If \const{false}, do not use or create a cache file.
    \termitem{register_namespaces}{Bool}
If \const{true} (default \const{false}), register \const{xmlns} namespace
declarations or Turtle \verb$@prefix$ prefixes using
\predref{rdf_register_prefix}{3} if there is no conflict.
    \termitem{silent}{+Bool}
If \const{true}, the message reporting completion is printed using
level \const{silent}. Otherwise the level is \const{informational}. See
also \predref{print_message}{2}.
    \termitem{prefixes}{-Prefixes}
Returns the prefixes defined in the source data file as a list
of pairs.
    \prefixtermitem{multifile}{\prefixterm{\Splus}{\arg{Boolean}}}
Indicate that the addressed graph may be populated with
triples from multiple sources. This disables caching and
avoids that an \predref{rdf_load}{2} call affecting the specified
graph cleans the graph.
\end{description}

Other options are forwarded to \predref{process_rdf}{3}. By default,
\predref{rdf_load}{2} only loads RDF/XML from files. It can be extended to
load data from other formats and locations using plugins. The
full set of plugins relevant to support different formats and
locations is below:

\begin{code}
:- use_module(library(semweb/turtle)).        % Turtle and TriG
:- use_module(library(semweb/rdf_ntriples)).
:- use_module(library(semweb/rdf_zlib_plugin)).
:- use_module(library(semweb/rdf_http_plugin)).
:- use_module(library(http/http_ssl_plugin)).
\end{code}

\begin{tags}
    \tag{See also}
\qpredref{rdf_db}{rdf_open_hook}{3}, \file{library(semweb/rdf_persistency)} and
\file{library(semweb/rdf_cache)}
\end{tags}

    \predicate[det]{rdf_unload}{1}{+Source}
Identify the graph loaded from \arg{Source} and use \predref{rdf_unload_graph}{1}
to erase this graph.

\begin{tags}
    \tag{deprecated}
For compatibility, this predicate also accepts a
graph name instead of a source specification.
Please update your code to use
\predref{rdf_unload_graph}{1}.
\end{tags}

    \predicate[det]{rdf_save}{1}{+Out}
Same as \verb$rdf_save(Out, [])$. See \predref{rdf_save}{2} for details.

    \predicate[det]{rdf_save}{2}{+Out, :Options}
Write RDF data as RDF/XML. \arg{Options} is a list of one or more of
the following options:

\begin{description}
    \termitem{graph}{+Graph}
Save only triples associated to the given named \arg{Graph}.
    \termitem{anon}{Bool}
If \const{false} (default \const{true}) do not save blank nodes that do
not appear (indirectly) as object of a named resource.
    \termitem{base_uri}{URI}
BaseURI used. If present, all URIs that can be
represented relative to this base are written using
their shorthand. See also \verb$write_xml_base$ option.
    \termitem{convert_typed_literal}{:Convertor}
Call \arg{Convertor}(-Type, -Content, +RDFObject), providing
the opposite for the convert_typed_literal option of
the RDF parser.
    \termitem{document_language}{+Lang}
Initial \verb$xml:lang$ saved with rdf:RDF element.
    \termitem{encoding}{Encoding}
\arg{Encoding} for the output. Either utf8 or iso_latin_1.
    \termitem{inline}{+Bool}
If \const{true} (default \const{false}), inline resources when
encountered for the first time. Normally, only bnodes
are handled this way.
    \termitem{namespaces}{+List}
Explicitly specify saved namespace declarations. See
\predref{rdf_save_header}{2} option namespaces for details.
    \termitem{sorted}{+Boolean}
If \const{true} (default \const{false}), emit subjects sorted on
the full URI. Useful to make file comparison easier.
    \termitem{write_xml_base}{Bool}
If \const{false}, do \textit{not} include the \verb$xml:base$
declaration that is written normally when using the
\verb$base_uri$ option.
    \termitem{xml_attributes}{+Bool}
If \const{false} (default \const{true}), never use xml attributes to
save plain literal attributes, i.e., always used an XML
element as in \verb$<name>Joe</name>$.
\end{description}

\begin{arguments}
\arg{Out} & Location to save the data. This can also be a
file-url (\verb$file://path$) or a stream wrapped
in a term \verb$stream(Out)$. \\
\end{arguments}

\begin{tags}
    \tag{See also}
\predref{rdf_save_db}{1}
\end{tags}

    \predicate{rdf_make}{0}{}
Reload all loaded files that have been modified since the last
time they were loaded.
\end{description}

\paragraph{Partial save}\label{sec:semweb-partial-save}

Sometimes it is necessary to make more arbitrary selections of material
to be saved or exchange RDF descriptions over an open network link. The
predicates in this section provide for this. Character encoding issues
are derived from the encoding of the \textit{Stream}, providing support for
\const{utf8}, \verb$iso_latin_1$ and \const{ascii}.

\begin{description}
    \predicate{rdf_save_header}{2}{+Fd, +Options}
Save XML document header, doctype and open the RDF environment.
This predicate also sets up the namespace notation.

Save an RDF header, with the XML header, DOCTYPE, ENTITY and
opening the rdf:RDF element with appropriate namespace
declarations. It uses the primitives from section 3.5 to
generate the required namespaces and desired short-name. \arg{Options}
is one of:

\begin{description}
    \termitem{graph}{+URI}
Only search for namespaces used in triples that belong to the
given named graph.
    \termitem{namespaces}{+List}
Where \arg{List} is a list of namespace abbreviations. With this
option, the expensive search for all namespaces that may be
used by your data is omitted. The namespaces \const{rdf} and \const{rdfs}
are added to the provided \arg{List}. If a namespace is not
declared, the resource is emitted in non-abreviated form.
\end{description}

    \predicate[det]{rdf_save_footer}{1}{Out:stream}
Finish XML generation and write the document footer.

\begin{tags}
    \tag{See also}
\predref{rdf_save_header}{2}, \predref{rdf_save_subject}{3}.
\end{tags}

    \predicate[det]{rdf_save_subject}{3}{+Out, +Subject:resource, +Options}
Save the triples associated to \arg{Subject} to \arg{Out}. \arg{Options}:

\begin{description}
    \termitem{graph}{+Graph}
Only save properties from \arg{Graph}.
    \termitem{base_uri}{+URI}
    \termitem{convert_typed_literal}{:Goal}
    \termitem{document_language}{+XMLLang}
\end{description}

\begin{tags}
    \tag{See also}
\predref{rdf_save}{2} for a description of these options.
\end{tags}
\end{description}

\paragraph{Fast loading and saving}\label{sec:semweb-fast-load-save}

Loading and saving RDF format is relatively slow. For this reason we
designed a binary format that is more compact, avoids the complications
of the RDF parser and avoids repetitive lookup of (URL) identifiers.
Especially the speed improvement of about 25 times is worth-while when
loading large databases. These predicates are used for caching by
\predref{rdf_load}{2} under certain conditions as well as for maintaining
persistent snapshots of the database using
\file{library(semweb/rdf_persistency)}.

\begin{description}
    \predicate[det]{rdf_save_db}{1}{+File}
\nodescription
    \predicate[det]{rdf_save_db}{2}{+File, +Graph}
Save triples into \arg{File} in a quick-to-load binary format. If \arg{Graph}
is supplied only triples flagged to originate from that database
are added. Files created this way can be loaded using
\predref{rdf_load_db}{1}.

    \predicate[det]{rdf_load_db}{1}{+File}
Load triples from a file created using \predref{rdf_save_db}{2}.
\end{description}

\subsubsection{Graph manipulation}

\label{sec:semweb-graphs}

Many RDF stores turned triples into quadruples. This store is no
exception, initially using the 4th argument to store the filename from
which the triple was loaded. Currently, the 4th argument is the RDF
\textit{named graph}. A named graph maintains some properties, notably to
track origin, changes and modified state.

\begin{description}
    \predicate[det]{rdf_create_graph}{1}{+Graph}
Create an RDF graph without triples. Succeeds silently if the
graph already exists.

    \predicate[det]{rdf_unload_graph}{1}{+Graph}
Remove \arg{Graph} from the RDF store. Succeeds silently if the named
graph does not exist.

    \predicate[nondet]{rdf_graph_property}{2}{?Graph, ?Property}
True when \arg{Property} is a property of \arg{Graph}. Defined properties
are:

\begin{description}
    \termitem{hash}{Hash}
\arg{Hash} is the (MD5-)hash for the content of \arg{Graph}.
    \termitem{modified}{Boolean}
True if the graph is modified since it was loaded or
\predref{rdf_set_graph}{2} was called with \verb$modified(false)$.
    \termitem{source}{Source}
The graph is loaded from the \arg{Source} (a URL)
    \termitem{source_last_modified}{?Time}
\arg{Time} is the last-modified timestamp of Source at the moment
the graph was loaded from Source.
    \termitem{triples}{Count}
True when \arg{Count} is the number of triples in \arg{Graph}.
\end{description}

Additional graph properties can be added by defining rules for
the multifile predicate \predref{property_of_graph}{2}. Currently, the
following extensions are defined:

\begin{itemize}
    \item \file{library(semweb/rdf_persistency)}

\begin{description}
    \termitem{persistent}{Boolean}
\arg{Boolean} is \const{true} if the graph is persistent.
\end{description}
\end{itemize}

    \predicate[det]{rdf_set_graph}{2}{+Graph, +Property}
Set properties of \arg{Graph}. Defined properties are:

\begin{description}
    \termitem{modified}{false}
Set the modified state of \arg{Graph} to false.
\end{description}
\end{description}

\subsubsection{Literal matching and indexing}

\label{sec:semweb-literals}

Literal values are ordered and indexed using a \textit{skip list}. The aim of
this index is threefold.

\begin{itemize}
    \item Unlike hash-tables, binary trees allow for efficient
\textit{prefix} and \textit{range} matching. Prefix matching
is useful in interactive applications to provide feedback
while typing such as auto-completion.
    \item Having a table of unique literals we generate creation and
destruction events (see \predref{rdf_monitor}{2}). These events can
be used to maintain additional indexing on literals, such
as `by word'. See \verb$library(semweb/litindex)$.
\end{itemize}

As string literal matching is most frequently used for searching
purposes, the match is executed case-insensitive and after removal of
diacritics. Case matching and diacritics removal is based on Unicode
character properties and independent from the current locale. Case
conversion is based on the `simple uppercase mapping' defined by Unicode
and diacritic removal on the `decomposition type'. The approach is
lightweight, but somewhat simpleminded for some languages. The tables
are generated for Unicode characters upto 0x7fff. For more information,
please check the source-code of the mapping-table generator
\verb$unicode_map.pl$ available in the sources of this package.

Currently the total order of literals is first based on the type of
literal using the ordering \textit{numeric $<$ string $<$ term} Numeric values
(integer and float) are ordered by value, integers preceed floats if
they represent the same value. Strings are sorted alphabetically after
case-mapping and diacritic removal as described above. If they match
equal, uppercase preceeds lowercase and diacritics are ordered on their
unicode value. If they still compare equal literals without any
qualifier preceeds literals with a type qualifier which preceeds
literals with a language qualifier. Same qualifiers (both type or both
language) are sorted alphabetically.

The ordered tree is used for indexed execution of
\verb$literal(prefix(Prefix), Literal)$ as well as \verb$literal(like(Like), Literal)$
if \textit{Like} does not start with a `*'. Note that results of queries that
use the tree index are returned in alphabetical order.

\subsubsection{Predicate properties}

\label{sec:semweb-predicates}

The predicates below form an experimental interface to provide more
reasoning inside the kernel of the rdb_db engine. Note that \const{symetric},
\verb$inverse_of$ and \const{transitive} are not yet supported by the rest of the
engine. Also note that there is no relation to defined RDF properties.
Properties that have no triples are not reported by this predicate,
while predicates that are involved in triples do not need to be defined
as an instance of rdf:Property.

\begin{description}
    \predicate[det]{rdf_set_predicate}{2}{+Predicate, +Property}
Define a property of the predicate. This predicate currently
supports the following properties:

\begin{description}
    \termitem{symmetric}{+Boolean}
Set/unset the predicate as being symmetric. Using
\verb$symmetric(true)$ is the same as \verb$inverse_of(Predicate)$,
i.e., creating a predicate that is the inverse of
itself.
    \termitem{transitive}{+Boolean}
Sets the transitive property.
    \termitem{inverse_of}{+Predicate2}
Define \arg{Predicate} as the inverse of \arg{Predicate2}. An inverse
relation is deleted using \verb$inverse_of([])$.
\end{description}

The \const{transitive} property is currently not used. The \const{symmetric}
and \verb$inverse_of$ properties are considered by \predref{rdf_has}{3},4 and
\predref{rdf_reachable}{3}.

\begin{tags}
    \tag{To be done}
Maintain these properties based on OWL triples.
\end{tags}

    \predicate{rdf_predicate_property}{2}{?Predicate, ?Property}
Query properties of a defined predicate. Currently defined
properties are given below.

\begin{description}
    \termitem{symmetric}{Bool}
True if the predicate is defined to be symetric. I.e., \{A\} P
\{B\} implies \{B\} P \{A\}. Setting symmetric is equivalent to
\verb$inverse_of(Self)$.
    \termitem{inverse_of}{Inverse}
True if this predicate is the inverse of \arg{Inverse}. This
property is used by \predref{rdf_has}{3}, \predref{rdf_has}{4}, \predref{rdf_reachable}{3} and
\predref{rdf_reachable}{5}.
    \termitem{transitive}{Bool}
True if this predicate is transitive. This predicate is
currently not used. It might be used to make \predref{rdf_has}{3} imply
\predref{rdf_reachable}{3} for transitive predicates.
    \termitem{triples}{Triples}
Unify \arg{Triples} with the number of existing triples using this
predicate as second argument. Reporting the number of triples
is intended to support query optimization.
    \termitem{rdf_subject_branch_factor}{-Float}
Unify \arg{Float} with the average number of triples associated with
each unique value for the subject-side of this relation. If
there are no triples the value 0.0 is returned. This value is
cached with the predicate and recomputed only after
substantial changes to the triple set associated to this
relation. This property is intended for path optimalisation
when solving conjunctions of \predref{rdf}{3} goals.
    \termitem{rdf_object_branch_factor}{-Float}
Unify \arg{Float} with the average number of triples associated with
each unique value for the object-side of this relation. In
addition to the comments with the \verb$rdf_subject_branch_factor$
property, uniqueness of the object value is computed from the
hash key rather than the actual values.
    \termitem{rdfs_subject_branch_factor}{-Float}
Same as \verb$rdf_subject_branch_factor$, but also considering
triples of `subPropertyOf' this relation. See also \predref{rdf_has}{3}.
    \termitem{rdfs_object_branch_factor}{-Float}
Same as \verb$rdf_object_branch_factor$, but also considering
triples of `subPropertyOf' this relation. See also \predref{rdf_has}{3}.
\end{description}

\begin{tags}
    \tag{See also}
\predref{rdf_set_predicate}{2}.
\end{tags}
\end{description}

\subsubsection{Prefix Handling}

\label{sec:semweb-prefixes}

Prolog code often contains references to constant resources with a known
\textit{prefix} (also known as XML \textit{namespaces}). For example,
\verb$http://www.w3.org/2000/01/rdf-schema#Class$ refers to the most
general notion of an RDFS class. Readability and maintability concerns
require for abstraction here. The RDF database maintains a table of
known \textit{prefixes}. This table can be queried using \predref{rdf_current_ns}{2} and
can be extended using \predref{rdf_register_ns}{3}. The prefix database is used to
expand \verb$prefix:local$ terms that appear as arguments to calls which
are known to accept a \textit{resource}. This expansion is achieved by Prolog
preprocessor using \predref{expand_goal}{2}.

\begin{description}
    \predicate[nondet]{rdf_current_prefix}{2}{:Alias, ?URI}
Query predefined prefixes and prefixes defined with
\predref{rdf_register_prefix}{2} and local prefixes defined with
\predref{rdf_prefix}{2}. If \arg{Alias} is unbound and one \arg{URI} is the prefix of
another, the longest is returned first. This allows turning a
resource into a prefix/local couple using the simple enumeration
below. See \predref{rdf_global_id}{2}.

\begin{code}
rdf_current_prefix(Prefix, Expansion),
atom_concat(Expansion, Local, URI),
\end{code}

    \predicate[det]{rdf_register_prefix}{2}{+Prefix, +URI}
\nodescription
    \predicate[det]{rdf_register_prefix}{3}{+Prefix, +URI, +Options}
Register \arg{Prefix} as an abbreviation for \arg{URI}. \arg{Options}:

\begin{description}
    \termitem{force}{Boolean}
If \const{true}, replace existing namespace alias. Please note
that replacing a namespace is dangerous as namespaces
affect preprocessing. Make sure all code that depends on
a namespace is compiled after changing the registration.
    \termitem{keep}{Boolean}
If \const{true} and Alias is already defined, keep the
original binding for \arg{Prefix} and succeed silently.
\end{description}

Without options, an attempt to redefine an alias raises a
permission error.

Predefined prefixes are:

\begin{quote}
\begin{tabulary}{0.9\textwidth}{|l|l|}
\hline
\textbf{Alias} & \textbf{IRI prefix} \\
dc & \url{http://purl.org/dc/elements/1.1/} \\
dcterms & \url{http://purl.org/dc/terms/} \\
eor & \url{http://dublincore.org/2000/03/13/eor\#} \\
foaf & \url{http://xmlns.com/foaf/0.1/} \\
owl & \url{http://www.w3.org/2002/07/owl\#} \\
rdf & \url{http://www.w3.org/1999/02/22-rdf-syntax-ns\#} \\
rdfs & \url{http://www.w3.org/2000/01/rdf-schema\#} \\
serql & \url{http://www.openrdf.org/schema/serql\#} \\
skos & \url{http://www.w3.org/2004/02/skos/core\#} \\
void & \url{http://rdfs.org/ns/void\#} \\
xsd & \url{http://www.w3.org/2001/XMLSchema\#} \\
\hline
\end{tabulary}
\end{quote}
\end{description}

\textit{Explicit} expansion is achieved using the predicates below. The
predicate \predref{rdf_equal}{2} performs this expansion at compile time, while the
other predicates do it at runtime.

\begin{description}
    \predicate{rdf_equal}{2}{?Resource1, ?Resource2}
Simple equality test to exploit goal-expansion.

    \predicate[semidet]{rdf_global_id}{2}{?IRISpec, :IRI}
Convert between Prefix:Local and full \arg{IRI} (an atom). If \arg{IRISpec} is
an atom, it is simply unified with \arg{IRI}. This predicate fails
silently if \arg{IRI} is an RDF literal.

Note that this predicate is a meta-predicate on its output argument.
This is necessary to get the module context while the first argument
may be of the form (:)/2. The above mode description is correct, but
should be interpreted as (?,?).

\begin{tags}
    \tag{Errors}
\verb$existence_error(rdf_prefix, Prefix)$\mtag{See also}- \predref{rdf_equal}{2} provides a compile time alternative \\- The \predref{rdf_meta}{1} directive asks for compile time expansion
of arguments.
    \tag{bug}
Error handling is incomplete. In its current implementation
the same code is used for compile-time expansion and to
facilitate runtime conversion and checking. These use cases
have different requirements.
\end{tags}

    \predicate[semidet]{rdf_global_object}{2}{+Object, :GlobalObject}
\nodescription
    \predicate[semidet]{rdf_global_object}{2}{-Object, :GlobalObject}
Same as \predref{rdf_global_id}{2}, but intended for dealing with the
object part of a triple, in particular the type for typed
literals. Note that the predicate is a meta-predicate on the
output argument. This is necessary to get the module context
while the first argument may be of the form (:)/2.

\begin{tags}
    \tag{Errors}
\verb$existence_error(rdf_prefix, Prefix)$
\end{tags}

    \predicate[det]{rdf_global_term}{2}{+TermIn, :GlobalTerm}
Performs \predref{rdf_global_id}{2} on predixed IRIs and \predref{rdf_global_object}{2} on
RDF literals, by recursively analysing the term. Note that the
predicate is a meta-predicate on the output argument. This is
necessary to get the module context while the first argument may be
of the form (:)/2.

Terms of the form \verb$Prefix:Local$ that appear in \arg{TermIn} for which
\arg{Prefix} is not defined are not replaced. Unlike \predref{rdf_global_id}{2} and
\predref{rdf_global_object}{2}, no error is raised.
\end{description}

\paragraph{Namespace handling for custom predicates}\label{sec:semweb-meta}

If we implement a new predicate based on one of the predicates of the
semweb libraries that expands namespaces, namespace expansion is not
automatically available to it. Consider the following code computing the
number of distinct objects for a certain property on a certain object.

\begin{code}
cardinality(S, P, C) :-
      (   setof(O, rdf_has(S, P, O), Os)
      ->  length(Os, C)
      ;   C = 0
      ).
\end{code}

Now assume we want to write \predref{labels}{2} that returns the number of distict
labels of a resource:

\begin{code}
labels(S, C) :-
      cardinality(S, rdfs:label, C).
\end{code}

This code will \textit{not} work because \verb$rdfs:label$ is not expanded at
compile time. To make this work, we need to add an \predref{rdf_meta}{1}
declaration.

\begin{code}
:- rdf_meta
      cardinality(r,r,-).
\end{code}

\begin{shortlist}
    \item [[\predref{rdf_meta}{1}]]
\end{shortlist}

The example below defines the rule \predref{concept}{1}.

\begin{code}
:- use_module(library(semweb/rdf_db)).  % for rdf_meta
:- use_module(library(semweb/rdfs)).    % for rdfs_individual_of

:- rdf_meta
        concept(r).

%%      concept(?C) is nondet.
%
%       True if C is a concept.

concept(C) :-
        rdfs_individual_of(C, skos:'Concept').
\end{code}

In addition to expanding \textit{calls}, \predref{rdf_meta}{1} also causes expansion of
\textit{clause heads} for predicates that match a declaration. This is
typically used write Prolog statements about resources. The following
example produces three clauses with expanded (single-atom) arguments:

\begin{code}
:- use_module(library(semweb/rdf_db)).

:- rdf_meta
        label_predicate(r).

label_predicate(rdfs:label).
label_predicate(skos:prefLabel).
label_predicate(skos:altLabel).
\end{code}

\subsubsection{Miscellaneous predicates}

\label{sec:semweb-misc}

This section describes the remaining predicates of the
\file{library(semweb/rdf_db)} module.

\begin{description}
    \predicate{rdf_bnode}{1}{-Id}
Generate a unique anonymous identifier for a subject.

    \predicate[nondet]{rdf_source_location}{2}{+Subject, -Location}
True when triples for \arg{Subject} are loaded from \arg{Location}.

\begin{arguments}
\arg{Location} & is a term File:Line. \\
\end{arguments}

    \predicate[det]{rdf_generation}{1}{-Generation}
True when \arg{Generation} is the current generation of the database.
Each modification to the database increments the generation. It
can be used to check the validity of cached results deduced from
the database. Committing a non-empty transaction increments the
generation by one.

When inside a transaction, \arg{Generation} is unified to a term
\textit{TransactionStartGen} + \textit{InsideTransactionGen}. E.g., 4+3 means
that the transaction was started at generation 4 of the global
database and we have created 3 new generations inside the
transaction. Note that this choice of representation allows for
comparing generations using Prolog arithmetic. Comparing a
generation in one transaction with a generation in another
transaction is meaningless.

    \predicate{rdf_estimate_complexity}{4}{?Subject, ?Predicate, ?Object, -Complexity}
Return the number of alternatives as indicated by the database
internal hashed indexing. This is a rough measure for the number
of alternatives we can expect for an \predref{rdf_has}{3} call using the
given three arguments. When called with three variables, the
total number of triples is returned. This estimate is used in
query optimisation. See also \predref{rdf_predicate_property}{2} and
\predref{rdf_statistics}{1} for additional information to help optimizers.

    \predicate[nondet]{rdf_statistics}{1}{?KeyValue}
Obtain statistics on the RDF database. Defined statistics are:

\begin{description}
    \termitem{graphs}{-Count}
Number of named graphs.
    \termitem{triples}{-Count}
Total number of triples in the database. This is the number
of asserted triples minus the number of retracted ones. The
number of \textit{visible} triples in a particular context may be
different due to visibility rules defined by the logical
update view and transaction isolation.
    \termitem{resources}{-Count}
Number of resources that appear as subject or object in a
triple. See \predref{rdf_resource}{1}.
    \termitem{properties}{-Count}
Number of current predicates. See \predref{rdf_current_predicate}{1}.
    \termitem{literals}{-Count}
Number of current literals. See \predref{rdf_current_literal}{1}.
    \termitem{gc}{GCCount, ReclaimedTriples, ReindexedTriples, Time}
Information about the garbage collector.
    \termitem{searched_nodes}{-Count}
Number of nodes expanded by \predref{rdf_reachable}{3} and
\predref{rdf_reachable}{5}.
    \termitem{lookup}{rdf(S,P,O,G), Count}
Number of queries that have been performed for this particular
instantiation pattern. Each of \arg{S},\arg{P},\arg{O},\arg{G} is either + or -.
Fails in case the number of performed queries is zero.
    \termitem{hash_quality}{rdf(S,P,O,G), Buckets, Quality, PendingResize}
Statistics on the index for this pattern. Indices are created
lazily on the first relevant query.
    \termitem{triples_by_graph}{Graph, Count}
This statistics is produced for each named graph. See
\const{triples} for the interpretation of this value.
\end{description}

    \predicate[semidet]{rdf_match_label}{3}{+How, +Pattern, +Label}
True if \arg{Label} matches \arg{Pattern} according to \arg{How}. \arg{How} is one of
\const{icase}, \const{substring}, \const{word}, \const{prefix} or \const{like}. For backward
compatibility, \const{exact} is a synonym for \const{icase}.

    \predicate[semidet]{lang_matches}{2}{+Lang, +Pattern}
True if \arg{Lang} matches \arg{Pattern}. This implements XML language
matching conform RFC 4647. Both \arg{Lang} and \arg{Pattern} are
dash-separated strings of identifiers or (for \arg{Pattern}) the
wildcard *. Identifiers are matched case-insensitive and a *
matches any number of identifiers. A short pattern is the same
as *.

    \predicate[semidet]{lang_equal}{2}{+Lang1, +Lang2}
True if two RFC language specifiers denote the same language

\begin{tags}
    \tag{See also}
\predref{lang_matches}{2}.
\end{tags}

    \predicate{rdf_reset_db}{0}{}
Remove all triples from the RDF database and reset all its
statistics.

\begin{tags}
    \tag{bug}
This predicate checks for active queries, but this check is
not properly synchronized and therefore the use of this
predicate is unsafe in multi-threaded contexts. It is
mainly used to run functionality tests that need to
start with an empty database.
\end{tags}

    \predicate[det]{rdf_version}{1}{-Version}
True when \arg{Version} is the numerical version-id of this library.
The version is computed as

\begin{code}
Major*10000 + Minor*100 + Patch.
\end{code}

\end{description}

\subsubsection{Memory management considerations}

\label{sec:semweb-memory-management}

Storing RDF triples in main memory provides much better performance than
using external databases. Unfortunately, although memory is fairly cheap
these days, main memory is severely limited when compared to disks.
Memory usage breaks down to the following categories. Rough estimates
of the memory usage is given \textbf{for 64-bit systems}. 32-bit system use
slightly more than half these amounts.

\begin{itemize}
    \item Actually storing the triples. A triple is stored in a C struct
of 144 bytes. This struct both holds the quintuple,
some bookkeeping information and the 10 next-pointers for the (max)
to hash tables.
    \item The bucket array for the hashes. Each bucket maintains a
\textit{head}, and \textit{tail} pointer, as well as a count for the number of
entries. The bucket array is allocated if a particular index is
created, which implies the first query that requires the index.
Each bucket requires 24 bytes.

Bucket arrays are resized if necessary. Old triples remain at
their original location. This implies that a query may need to
scan multiple buckets. The garbage collector may relocate old
indexed triples. It does so by copying the old triple. The
old triple is later reclaimed by GC. Reindexed triples will
be reused, but many reindexed triples may result in a significant
memory fragmentation.
    \item Resources are maintained in a separate table to support
\predref{rdf_resource}{1}. A resources requires approximately 32 bytes.
    \item Identical literals are shared (see \predref{rdf_current_literal}{1}) and
stored in a \textit{skip list}. A literal requires approximately
40 bytes, excluding the atom used for the lexical representation.
    \item Resources are stored in the Prolog atom-table. Atoms with the
average length of a resource require approximately 88 bytes.
\end{itemize}

The hash parameters can be controlled with \predref{rdf_set}{1}. Applications that
are tight on memory and for which the query characteristics are more or
less known can optimize performance and memory by fixing the
hash-tables. By fixing the hash-tables we can tailor them to the
frequent query patterns, we avoid the need for to check multiple hash
buckets (see above) and we avoid memory fragmentation due to optimizing
triples for resized hashes.

\begin{code}
set_hash_parameters :-
      rdf_set(hash(s,   size, 1048576)),
      rdf_set(hash(p,   size, 1024)),
      rdf_set(hash(sp,  size, 2097152)),
      rdf_set(hash(o,   size, 1048576)),
      rdf_set(hash(po,  size, 2097152)),
      rdf_set(hash(spo, size, 2097152)),
      rdf_set(hash(g,   size, 1024)),
      rdf_set(hash(sg,  size, 1048576)),
      rdf_set(hash(pg,  size, 2048)).
\end{code}

\begin{description}
    \predicate[det]{rdf_set}{1}{+Term}
Set properties of the RDF store. Currently defines:

\begin{description}
    \termitem{hash}{+Hash, +Parameter, +Value}
Set properties for a triple index. \arg{Hash} is one of \const{s},
\const{p}, \const{sp}, \const{o}, \const{po}, \const{spo}, \const{g}, \const{sg} or \const{pg}. \arg{Parameter}
is one of:

\begin{description}
    \termitem{size}{}
\arg{Value} defines the number of entries in the hash-table.
\arg{Value} is rounded \textit{down} to a power of 2. After setting
the size explicitly, auto-sizing for this table is
disabled. Setting the size smaller than the current
size results in a \verb$permission_error$ exception.
    \termitem{average_chain_len}{}
Set maximum average collision number for the hash.
    \termitem{optimize_threshold}{}
Related to resizing hash-tables. If 0, all triples are
moved to the new size by the garbage collector. If more
then zero, those of the last \arg{Value} resize steps remain at
their current location. Leaving cells at their current
location reduces memory fragmentation and slows down
access.
\end{description}
\end{description}
\end{description}

\paragraph{The garbage collector}\label{sec:semweb-gc}

The RDF store has a garbage collector that runs in a separate thread
named =__rdf_GC=. The garbage collector removes the following objects:

\begin{itemize}
    \item Triples that have died before the the generation of last
still active query.
    \item Entailment matrices for \verb$rdfs:subPropertyOf$ relations that are
related to old queries.
\end{itemize}

In addition, the garbage collector reindexes triples associated to the
hash-tables before the table was resized. The most recent resize
operation leads to the largest number of triples that require
reindexing, while the oldest resize operation causes the largest
slowdown. The parameter \verb$optimize_threshold$ controlled by \predref{rdf_set}{1} can
be used to determine the number of most recent resize operations for
which triples will not be reindexed. The default is 2.

Normally, the garbage collector does it job in the background at a low
priority. The predicate \predref{rdf_gc}{0} can be used to reclaim all garbage and
optimize all indexes.\paragraph{Warming up the database}\label{sec:semweb-warming-up}

The RDF store performs many operations lazily or in background threads.
For maximum performance, perform the following steps:

\begin{itemize}
    \item Load all the data without doing queries or retracting data in
between. This avoids creating the indexes and therefore the need
to resize them.
    \item Perform each of the indexed queries. The following call performs
this. Note that it is irrelevant whether or not the query succeeds.

\begin{code}
warm_indexes :-
    ignore(rdf(s, _, _)),
    ignore(rdf(_, p, _)),
    ignore(rdf(_, _, o)),
    ignore(rdf(s, p, _)),
    ignore(rdf(_, p, o)),
    ignore(rdf(s, p, o)),
    ignore(rdf(_, _, _, g)),
    ignore(rdf(s, _, _, g)),
    ignore(rdf(_, p, _, g)).
\end{code}

    \item Duplicate adminstration is initialized in the background after the
first call that returns a significant amount of duplicates. Creating
the adminstration can be forced by calling \predref{rdf_update_duplicates}{0}.
\end{itemize}

Predicates:

\begin{description}
    \predicate[det]{rdf_gc}{0}{}
Run the RDF-DB garbage collector until no garbage is left and all
tables are fully optimized. Under normal operation a separate thread
with identifier \verb$__rdf_GC$ performs garbage collection as long as
it is considered `useful'.

Using \predref{rdf_gc}{0} should only be needed to ensure a fully clean
database for analysis purposes such as leak detection.

    \predicate[det]{rdf_update_duplicates}{0}{}
Update the duplicate administration of the RDF store. This marks
every triple that is potentionally a duplicate of another as
duplicate. Being potentially a duplicate means that subject,
predicate and object are equivalent and the life-times of the
two triples overlap.

The duplicates marks are used to reduce the administrative load
of avoiding duplicate answers. Normally, the duplicates are
marked using a background thread that is started on the first
query that produces a substantial amount of duplicates.
\end{description}