---+ Whitepaper: The ClioPatria Semantic Web server ---++ What is ClioPatria? ClioPatria is a [[(SWI-)Prolog][http://www.swi-prolog.org]] hosted HTTP application-server with libraries for Semantic Web reasoning and a set of JavaScript libraries for presenting results in a browser. Another way to describe ClioPatria is as ``Tomcat+Sesame (or Jena) with additional reasoning libraries in Prolog, completed by JavaScript presentation components''. ---++ Why is ClioPatria based on Prolog? Prolog is a logic-based language using a simple depth-first resolution strategy (SLD resolution). This gives two readings to the same piece of code: the _declarative_ reading and the _procedural_ reading. The declarative reading facilitates understanding of the code and allows for reasoning about it. The procedural reading allows for specifying algorithms and sequential aspects of the code, something which we often need to describe interaction. In addition, Prolog is _reflexive_: it can reason about Prolog programs and construct them at runtime. Finally, Prolog is, like the RDF _|triple-model|_, _relational_. This match of paradigms avoids the complications involved with using Object Oriented languages for handling RDF (see [[below][<#oo>]]). We illustrate the fit between RDF and Prolog by translating an example query from the official [[SPARQL document][http://www.w3.org/TR/rdf-sparql-query/]]: SPARQL: == PREFIX foaf: SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox } == Below we define this query as a Prolog _predicate_. The translation is natural and compact. The query is expressed as a Prolog program rather than a string. This ensures that Prolog can reason about it: it validates the syntax and verifies the dependencies between this code fragment and the remainder of the application. See [[Wikipedia][http://en.wikipedia.org/wiki/Prolog]] if you need some introduction to the language. == :- rdf_register_ns(foaf, 'http://xmlns.com/foaf/0.1/'). name_mbox(Name, MBox) :- rdf(X, foaf:name, literal(Name)), rdf(X, foaf:mbox, MBox). == We can run this query interactively from the terminal as illustrated below. Here, the `=|;|=' is typed by the user asking for another solution and the final `=|.|=' indicates there are no more solutions. == ?- name_mbox(Name, MBox). Name = 'Johnny Lee Outlaw', MBox = 'mailto:jlow@example.com' ; Name = 'Peter Goodguy', MBox = 'mailto:peter@example.org'. == Returning all solutions is all that is provided by the SPARQL query. However, our program is capable of doing more because it describes the _logical_ relation ``_Name_ is the name of an entity that has mailbox _MBox_''. Therefore, we can ask: == ?- name_mbox('Johnny Lee Outlaw', X). MBox = 'mailto:jlow@example.com'. == This is _different_ from a loop over the resuls from the SPARQL query because the query does not iterate over all name-mailbox tuples, but only over those that have a resource with a name-property with the value =|'Johnny Lee Outlaw'|=. Finally, we can use the relation as a boolean test: == ?- name_mbox('Johnny Lee Outlaw', 'mailto:peter@example.org'). false. == Prolog's resolution technique has created a powerful building block for more complex queries from this simple translation of the SPARQL query. We can use this to create more complex queries. E.g., if we want to send a personalised message to all members on a mailinglist, we need the members and their names. The code below combines a simple statement query with the already-defined relation name_mbox/2. == employee_name_email(List, MBox, Name) :- rdf(List, list:member, MBox), name_mbox(Name, MBox). == ---+++ Optimising queries? Above, we used simple Prolog SLD resolution to join RDF statement queries. Proper RDF query language implementation perform optimisation. Here we can exploit Prolog's _reflexive_ capabilities. The code below reorganises the conjunction of the two rdf/3 goals to achieve optimal performance dynamically. This optimisation is based on the database dynamics _and_ which arguments are given (_instantiated_). E.g., if we call this relation with a given MBox it will swap the two RDF statements. == :- use_module(library(semweb/rdf_optimise))). name_mbox(Name, MBox) :- rdf_optimise((rdf(X, foaf:name, Name), rdf(X, foaf:mbox, MBox)), Goal), call(Goal). == ---+++ Our benefits? * The single primitive rdf(Subject, Predicate, Object) (rdf/3) suffices to realise all the basic graph-pattern matching that can be done in SPARQL. * We can give a name to a query (as name_mbox/2 above) and build complex queries from simple ones. This greatly simplifies maintenance of complex queries. * Optimisation and unfolding can be used to achieve optimal performance at small cost. * Instead of the predefined SPARQL and SeRQL functions and conditions, we can apply any Prolog predicate as condition or function anywhere in the query. We can also introduce other relations (e.g., from an RDBMS) into our predicates. * The RDF store is tightly connected to Prolog. This allows for arbitrary reasoning with and exploration of the RDF graph at low cost. ---+++ [oo] How does this compare to using Java? Above, we compared querying the triple-store in Prolog with SPARQL. What if we compare Prolog to using Sesame/Jena as a library? The marriage between Object Oriented systems and relational data in general and RDF in particular is discussed in [[ActiveRDF: embedding Semantic Web data into object-oriented languages][http://linkinghub.elsevier.com/retrieve/pii/S1570826808000401]]. Roughly, static languages such as Java allow for three approaches. Each of these either require setting up an _enumerator_ or dealing with _sets_. 1. GetStatement(): Query statements based on a pattern. This is comparable to what our Prolog based approach does, but in our approach a single call deals with all possible patterns dynamically. In Java we have to find what is given and loop through the bindings for the remaining values explicitely. 2. GetObject(): Query resources. In this schema an initial URI is used to create an object that reflects the URI. Methods on this object values on a given predicate. Note that this predicate must be specified as a string and thus escapes the analysis of the compiler. 3. Create an enumerator from a SPARQL query provided as a string. This approach again uses a string. Building this string is cumbersome and vulnerable to script injection (a security risc). _Joining_ results from any of these three possibilities requires hand-crafted (nested) loops. Statically typed Object Oriented languages cannot easily overcomes these problems. For dynamically typed languages such as [[Ruby][http://www.ruby-lang.org/]], the situation can be improved significantly as demonstrated by the [[ActiveRDF][http://www.activerdf.org/]] project. ActiveRDF abstracts from the RDF store and builds upon a well-established web-application development platform, but its handling of RDF is still cumbersome compared to what a relational language such as Prolog can achieve. ---++ What does ClioPatria provide? This section gives the highlights of the functionality you can find in ClioPatria. ---+++ Core SWI-Prolog libraries $ Web-application development : Developing a dynamic web-page is easy: register a predicate as a handler for an HTTP `file'. The predicate writes a document that conforms to the CGI specification and the server infrastructure takes care of the rest. In the file mbox.pl, we defined three ways to generate a table of names and mailboxes. $ RDF storage : The main-memory store is a natural extension to Prolog. It is [[memory-efficient][memusage.txt]]. The RDF store provides: * Reliable file-based persistency * Load and unload of data-sources * Full persistent history of modifications $ Full text search : Using rdf_find_literals/2, the user can query literals that contain words. The literal-search facility allows for searching tokens and prefixes as well as fuzzy search (case-insensitive, stemming, sounds-like (metaphone)) and numerical search (exact, larger, smaller, range). ---+++ ClioPatria extensions $ The development environment : ClioPatria is an interactive and self-documenting system that provides basic user-management and utilities to examine RDF data. $ SeRQL/SPARQL endpoint : Although generally not used for application development, the compliant RDF query endpoints make ClioPatria a standard component in Semantic Web applications that use such endpoints. $ Linked Open Data serer : Serve RDF repositories or fragments thereof as LOD with a single-line declaration. $ The CPACK package manager : This allows for submitting, and updating submissions (developer) or installing packages with all its dependencies (users) and make it a breeze to add functionality to your ClioPatria installation. For example, the following command installs an interactive search module =isearch= and its dependencies (=owl= and =statistics=). == ?- cpack_install(isearch). == ---++ Running ClioPatria You need two things to run ClioPatria: the ClioPatria sources and a recent version of SWI-Prolog that fits your computer. The ClioPatria resource viewer can display a graph that places a resource in context. This facility requires Graphviz. * http://cliopatria.swi-prolog.org/ * http://www.swi-prolog.org * http://www.graphviz.org/ ClioPatria runs on all major platforms supported by SWI-Prolog: Windows, MacOSX, Linux and, from source, on almost any Unix system. It supports both the 32-bit and 64-bit of these operating systems. Demanding servers (more than 10 million triples, complex queries) quickly need the 64-bit versions. Although 64-bit Linux based servers provide the most scalable, fast and robust platform for ClioPatria servers, ClioPatria can be used comfortably on a 32-bit Windows XP machine with 1GB memory as long as the dataset is limited to a few million triples.