This document is a tutorial introduction to the tolog topic map query language. It explains how to use all the features of the language, as defined in version 1.2.
Ontopia 5.1 2010-10-18
tolog is a language for querying and updating topic maps, inspired by Datalog (a subset of Prolog) and SQL. Using tolog one can query a topic map in much the same way as a relational database can be queried with SQL. It is possible to ask for all topics of a particular types, the names of all topics of a particular type in a particular scope, for all topics used as association role types, for all associations with more than two roles, and so on.
This tutorial will walk the reader through the use of tolog step by step, and by the end of it all of tolog will have been covered.
tolog is a logic-based query language, which means that the basic operation is for the user to ask tolog in which cases a certain assertion holds true, and tolog will then respond with all the sets of values that make the assertion true.
Assertions in tolog consist of predicates, which are relationships between sets of values. A predicate can be thought of as a table of all the sets of values that make it true, and querying is done by matching the query against the table, returning all sets of values that match.
One tolog predicate is the one known as instance-of
, which is used to query the topic instance-topic type relationship. One query using this predicate might be:
instance-of($TOPIC, $TYPE)?
In this query there is only one predicate, and that has variables being passed as both of its parameters. This means that we want all combinations of values for these variables that make the assertion true. That is, we want all combinations of (topic, topic type) that are found in the topic map. If we ran this query against opera.ltm
we might get a result like the following.
TOPIC | TYPE |
---|---|
RAI | organization |
RAI | TV company |
Teatro Nuovo | organization |
Casinò di San Remo | organization |
Imperial Opera | organization |
Teatro La Pariola | organization |
… | … |
This shows all topics that have a type, and for each topic, all of its types (which is why RAI appears twice; it is an instance of two types). The actual table of results is several hundred lines long.
Now, let’s say we want to find all instances of the type theatre
. Recall that the previous query was as shown below.
instance-of($TOPIC, $TYPE)?
This gave us all (topic, type) combinations because we had variables as both parameters. In this case we don’t want all possible types, we just want the theatre
, so we replace the variable by that particular type. This gives us the query below.
instance-of($TOPIC, theatre)?
Now we are asking for all values of TOPIC
that would make the above true; that is, we are asking for all instances of theatre
.
TOPIC |
---|
Teatro Massimo |
Teatro Bellini |
Teatro San Carlo |
Teatro La Pariola |
Académie Royale de Musique |
Teatro Pagliano |
… |
Similarly, should we want to know all types of which a topic is an instance we can replace the TOPIC
variable with a topic reference, and put in a variable as the second parameter, as shown below.
instance-of(teatro-massimo, $TYPE)?
When run, this query gives the following results.
TYPE |
---|
Theatre |
Organization |
We get ‘Theatre’, as expected, but also ‘Organization’, which may be somewhat surprising. In the topic map the only type given for Teatro Massimo is theatre
, so what is organization
doing here? The answer is that organization
is defined as a supertype of theatre
in the topic map. This means that any instance of theatre
is also an instance of organization
. tolog knows this and makes use of the information to make Teatro Massimo an instance of both types.
You should now be familiar with what a predicate is and what a variables. From this point on we’ll look at how to combine predicates in more complex ways, more types of predicates, and also some ways of processing the results of query.
As we said above, a predicate represents a relationship, and because of this tolog allows topic map associations to be treated as predicates. In this case the name of the predicate is the name of the association type, and the values are the topics playing roles in associations of this type. Given this we might expect to be able to write the following query to find all operas composed by Puccini.
composed-by(puccini, $OPERA)?
However, this results in an error, because the tolog engine is not able to work out which association role is played by puccini
and which by the OPERA
. This must be given explicitly in the query as shown below.
composed-by(puccini : composer, $OPERA : opera)?
Here it is made explicit that we want to find out when puccini
plays the role composer
and when the OPERA
plays the role opera
. This syntax is only used for association predicates, not for any other type of predicate. The result is as shown below.
OPERA |
---|
Le Villi |
Madame Butterfly |
Tosca |
Turandot |
Manon Lescaut |
La Bohème |
… |
We can play around with the variables here in the same way as we did for the instance-of
predicate, and get all combinations of opera and composer, and all composers for a specific opera (say tosca
). This is done in exactly the same way, so we won’t explain how here, but leave it as an exercise for the reader. See The tolog plug-in to learn how to run tolog queries in the Omnigator.
If we want to find all cities located in Italy we know that this is easy: we can just query the located-in
association type to find the city topics that have this association with the Italy topic. Similarly, if we want to find everyone born in a particular just query the born-in
association type.
But what if we want to find everyone born in a city located in Italy? What we want in this case is something like what is shown below.
born-in($PERSON : person, $PLACE : place)
AND
located-in($CITY : containee, italy : container)?
This query is quite close to the correct one, but there are two problems with it. First of all, the way to chain predicates together with an AND condition is simply to put a comma between them. (You can say that AND is the default boolean operator in tolog.) The second problem is that we want to make sure that the CITY
located in Italy is the same as the PLACE
where the person is born. If we use the same variable in both places tolog will take this to mean that the same values have to be used both places. So what we want is the query below.
born-in($PERSON : person, $CITY : place),
located-in($CITY : containee, italy : container)?
When this query is run the first predicate will produce a table of all (person, city) combinations where the person is born in the city. This is then passed to the second predicate, which takes out all rows where the city is not located in Italy In practice the tolog query optimizer will start with the second predicate and find all cities in Italy. It will then find all the people born in those cities. It will do this because the other way around will find lots of (person, city) combinations outside Italy, which will be wasted because they have to be thrown away afterwards. The result will of course be exactly the same. It may be useful to be aware, however, that tolog will not necessarily run the query exactly as given, but may transform it to a faster, but equivalent, query before running it..
We can chain together as many predicates as we want in order to express our query; there’s nothing unusual in having 4-5 predicates chained together to form a query.
In the above example we have the problem that the query returns both the people and the city, while we are only interested in the people. This can be solved by using projection, as follows:
select $PERSON from
born-in($PERSON : person, $CITY : place),
located-in($CITY : containee, italy : container)?
This tells the query engine that we’re only interested in the PERSON
variable, and so it will produce a query result that only consists of bindings of the PERSON
variable. In this case, this is not so important, as we could just as well have just ignored the bindings to the CITY
variable. In some cases, however, ignoring one variable means reducing the size of the query result (and avoiding duplicates), and sometimes reducing the query result size quite dramatically. One example is given below.
select $B from
instance-of($A, $B)?
This query returns all the topic types used in the topic map, instead of all class-instance pairs. The difference in the query result size (not to mention performance) can be enormous, which is what makes this useful.
Quite often, one uses queries to produce statistics about the data contained in a topic map. If one wants to find out which Italian opera composer was the most prolific, for example, one will want to count the number of query results. tolog supports this, as the query below shows.
select $A, count($B) from
composed-by($A : composer, $B : opera)?
This query produces the full A, B
table, then replaces all rows that have the same value in the A
column by the number of such rows. The result is to count the number of B
matches per A
match.
While the query above does return the number of operas composed by each composer, it does so in an order that is effectively random. What we really want is to see the composers ordered by decreasing number of composed operas. This can be achieved by writing the query as shown below.
select $A, count($B) from
composed-by($A : composer, $B : opera)
order by $B desc?
If the desc
keyword is removed, the query results will be ordered by ascending opera count instead of descending count. The ‘order by’ clause can contain any number of variables, separated by commas.
The asc
keyword specifies that columns be ordered in ascending order. This is also the default order.
What has been presented so far are only the simplest parts of tolog; what one could call “basic tolog”. tolog is much more powerful than this, however, and that additional power is explored in this section.
So far we have only looked at how to query on associations, but occurrences also contain useful information, and in many cases we want to retrieve or query on this information. This can be done in a manner very similar to how associations are queried: the occurrence type is used as the predicate. However, occurrences have a much simpler structure, so for these we only need two positional parameters: first the topic, then the occurrence value.
Finding the date of birth of every composer can be done as follows.
instance-of($COMPOSER, composer),
date-of-birth($COMPOSER, $DATE)?
This query gives the following result.
COMPOSER | DATE |
---|---|
Verdi, Giuseppe | 1813 (10 Oct) |
Ponchielli, Amilcar | 1834 (31 Aug) |
Cilea, Francisco | 1866 (23 Jul) |
Boïto, Arrigo | 1842 (24 Feb) |
Catalani, Alfredo | 1854 (19 Jun) |
Puccini, Giacomo | 1858 (22 Dec) |
… | … |
As with association predicates it is also possible to find the date of birth of a specific composer, or to find all composers born on a particular date. To do the latter we write:
date-of-birth($PERSON, "1867 (24 Mar)")?
This will find all topics which have this particular value for this particular occurrence type (though of course only people have dates of birth in this topic map), regardless of whether they are composers or not. As it turns out, only Guido Menasci has this value, and he’s a librettist
.
Note that locators are treated the same way as ordinary strings, which means that to find all topics whose home pages are http://www.puccini.it
, you can use the following query.
homepage($TOPIC, "http://www.puccini.it")?
This finds all topics with a homepage
occurrence whose locator is the URI http://www.puccini.it
. This turns out to produce the results shown below.
TOPIC |
---|
Centro studi Giacomo Puccini |
Quite often one may want to include information that matches either one condition OR another condition. If, for example, we want to find all operas which had their premiere in a particular city there are two ways to do that. Usually we know which theatre the opera had its premiere in, and in these cases the premiere is connected to that theatre, and the theatre is connected with the city. In some cases, however, all we know is the city, and in these cases the premiere is directly connected with the city. Therefore, if we want to find all operas which premiered in MilanoThe traditional English perversion of this is ‘Milan’., we have to do as shown below.
select $OPERA from
{ premiere($OPERA : opera, milano : place) |
premiere($OPERA : opera, $THEATRE : place),
located-in($THEATRE : containee, milano : container) }?
Here matches will be accepted so long as they satisfy one of the branches inside the curly braces, where the branches are separated by |
characters. Other predicates can be used in front of and after the curly braces, and the braces can be nested arbitrarily deep. The result is shown below.
OPERA |
---|
Il sindaco babbeo |
Melenis |
La Falce |
I Cavalieri di Ekebù |
Turandot |
Marcella |
… |
This capability is essentially the same as the boolean operator OR
in other languages, as it allows tolog to return results where the criteria are true if { A | B }
.
A closely related function is what is known as non-failing clauses; that is, a clause that will produce a value for a variable if it can, but if it cannot still won’t cause the query to fail. This is useful when selecting optional values which are intended to be used for display purposes.
One example might be if we want to all operas and for each the theatre in which it had its premiere. If we don’t know where it had its premiere we still want to show the opera, but we’ll leave out the theatre. This is easily done as follows, by including the optional part of the query in curly braces, without any alternative branches.
instance-of($OPERA, opera),
{ premiere($OPERA : opera, $THEATRE : place) }?
As you’ll see if you try it, this query also returns matches where THEATRE
is bound to a city, since nothing in the query says that it has to be bound to a theatre (tolog completely ignores the meaning of variable names), and operas also have this association with cities. So we have to add a condition that the THEATRE
actually be an instance of the type theatre
, as shown below.
instance-of($OPERA, opera),
{ premiere($OPERA : opera, $THEATRE : place),
instance-of($THEATRE, theatre) }?
This query gives the desired result, which is shown below.
THEATRE | OPERA |
---|---|
Teatro Argentina | I due foscari |
Teatro Sociale di Lecco | Il parlatore eterno |
Teatro La Fenice | Attila |
Teatro alla Scala | Madame Butterfly |
null | Marina |
null | Il sindaco babbeo |
… | … |
tolog supports negation, even though negation in logical querying is not at all straightforward. To take an example imagine running the query below.
not(born-in(milano : place, $A : person))?
What is this query going to return? All topics that have no born-in association with Milano? All person topics? An infinite number of topics, representing everything that wasn’t born in Milano? What about the persons which have no born-in association yet, but which were actually born in Milano? And so on.
The approach taken in tolog is that the not
operator is used as a filter. You must specify a start set, and the query processor will then eliminate the members of the start set which match the condition specified in the not
clause. The example below shows the people in the topic map that were not born in Italy.
instance-of($PERSON, person),
not(born-in($PLACE : place, $PERSON : person),
located-in($PLACE : containee, italy : container))?
This gives the results shown below.
PERSON |
---|
Guglielmo Ratcliff |
Manzoni, Alessandro |
Comtessa di Coigny |
Sardou, Victorien |
… |
One thing that stands out here is that Alessandro Manzoni actually is Italian. So why was he included? Well, this query is subtly different from asking for the composers born somewhere not in Italy. This query will also find composers which have no born-in
association, or who are born somewhere that has no located-in
association. It turns out that Alessandro Manzoni has no born-in
association, and so he is “not born in Italy”. To eliminate this type of match we would have had to pick out the country where the person is born and use /=
to make sure that it was not Italy. On the other hand, that would have left out Victorien Sardou, who is a person (with no born-in
association) and not Italian. (So why do we say all this? Simply to emphasise that not
is subtly different from true negation. It might not mean what you think it means.)
Another thing to note is that although this query uses the PLACE
variable that variable is not actually bound in the query. That is because every COMPOSER
that creates a match for PLACE
is eliminated, leaving only those which don’t. So while the query result will contain a PLACE
column, that column will be empty.
tolog also supports the comparison operators familiar from many other languages. We have already seen /=
, but there are many more. For example, there is also =
, which means the opposite of what /=
does. That is, it is true when the values on both sides are equal. This means that we can do a query to find operas which premiered on a particular day, as in:
premiere-date($OPERA, $DATE),
$DATE = "1870 (22 Feb)"?
This will give us the (single) opera premiered on this date, according to the Opera topic map. It should be noted that in most cases the =
comparator is not necessary, since if a variable is equal to another variable or a literal we can just replace all occurrences of the variable with what it is supposed to be equal to. This gives us simpler queries, like the one below.
premiere-date($OPERA, "1870 (22 Feb)")?
However, this comparator does have its uses, such as when defining an inference rule (see following section) that will connect a topic to all topics matching some condition as well as the topic itself. To put it another way: it’s useful to know that this predicate exists, but be aware that you rarely need it.
There are also four other comparison predicates (<
, >
, <=
, >=
) which only compare strings, and, at the moment, compare strings as strings. This makes it possible to do a query to find all operas premiered in the 19th century, as follows:
premiere-date($OPERA, $DATE),
$DATE < "1900"?
This will do a string comparison on the dates, and filter out everything premiered in the 20th century. Note that these four predicates do not compare anything other than strings (such as topics), and that they require both values they are comparing to be bound.
In many cases there are implicit relationships in the topic map not stated as associations, but which can be deduced from more basic relationships that are given explicitly as associations. Inference rules provide a way to capture these implicit relationships through the declaration of a simple rule. The rule can then be used throughout an application to simplify queries.
We can say that if an opera composer was a pupil of another composer, or if the composer wrote operas based on the work of another person, that other person or composer influenced the composer. We can of course query for this, but if we want to use the influenced-by relationship in several larger queries this quickly gets awkward. Instead, we can capture the relationship in an inference rule, as shown below.
influenced-by($A, $B) :- {
pupil-of($A : pupil, $B : teacher) |
composed-by($OPERA : opera, $A : composer),
based-on($OPERA : result, $WORK : source),
written-by($WORK : work, $B : writer)
}.
This rule has now created a new predicate called influenced-by
, which can be used in queries. Let’s say that we want to find every case where an Italian opera composer was influenced by someone who was not himself Italian. We can now do this as shown below.
instance-of($COMPOSER, composer),
influenced-by($COMPOSER, $INFLUENCE),
born-in($INFLUENCE : person, $PLACE : place),
not(located-in($PLACE : containee, italy : container))?
We skip verifying that the composer was born in Italy, since all composers in this topic map are born in Italy. This gives the result shown below.
COMPOSER | INFLUENCE |
---|---|
Verdi, Giuseppe | Méry, Joseph |
Ponchielli, Amilcar | Scribe, Eugène |
Cilea, Francesco | Daudet, Alphonse |
Cilea, Francesco | Scribe, Eugène |
Cilea, Francesco | Mélésville |
Inference rules can also use other influence rules, which means that one can build large reasoning structures by layering the inference rules on top of each other. Inference rules can also use themselves, which makes it possible to define recursive inference rules, for example for traversing hierarchical association structures.
Note that tolog has the same comment syntax as LTM (that is, comments begin with /*
and end with */
). This can be used for example to document inference rules, as shown below.
/* connects opera composers (A) with the people who influenced them
(B), either by being their teachers or by writing works on which
the composer based operas */
influenced-by($A, $B) :- {
pupil-of($A : pupil, $B : teacher) |
composed-by($OPERA : opera, $A : composer),
based-on($OPERA : result, $WORK : source),
written-by($WORK : work, $B : writer)
}.
In the example above it may look as though the bodies of inference rules should be wrapped in curly braces, like blocks in Java or C, but the curlies are there because the entire rule is one big OR clause. If we drop the pupil-of
part the rule would look as shown below. If we had wrapped this in curly braces we would have turned the body of the rule into a non-failing clause (Non-failing clauses), allowing it to match topics which don’t actually satisfy the rule. So beware of this.
influenced-by($A, $B) :-
composed-by($OPERA : opera, $A : composer),
based-on($OPERA : result, $WORK : source),
written-by($WORK : work, $B : writer).
As said above, tolog supports recursive inference rules, which are typically used to traverse hierarchies, or to query transitive association types. An example of such an inference rule, for a generic parent-child relationship, might look as follows:
descendant-of($ANC, $DES) :- {
parent-child($ANC : parent, $DES : child) |
parent-child($ANC : parent, $MID : child), descendant-of($MID, $DES)
}.
Notice how the rule is essentially a formalization of what it means to be a descendant. That is, either the ancestor is the parent of the descendant, or the ancestor is the parent of some middle topic of which the descendant is a descendant. This explanation may sound a little strange, and that’s because of the recursion in the last step.
The parts of tolog shown thus far only support querying on type-instance relationships, associations with a specific structure, and occurrences of specific types. It is impossible to do queries like “find all association types”, “find all associations between topics X and Y”, “find all occurrences of topic Z”, and so on with the parts of tolog that have been explained thus far.
tolog has a number of built-in predicates like instance-of
that make it possible to do this kind of query. These predicates are listed below, and more fully documented in The Built-in tolog Predicates — Reference Documentation.
instance-of
which connects a topic with the topics it is explicitly said to be an instance of. In other words, just like instance-of
, except that it ignores the superclass-subclass association.REIFIER
) with the topic map constructs (REIFIED
) they reify.OBJECT
), finds that locator.TYPED
) with their type, if they have one.SEARCHSTRING
argument, and will do a full-text search for topic map constructs that match the SEARCHSTRING
. This makes it possible to do full-text searches in tolog queries.We can’t give examples of the use of all of these predicates, since there are so many of them, but at least we can show some. We can start with “find all association types”, which is given below., “find all associations between topics X and Y”, “find all occurrences of topic Z”
select $TYPE from
association($ASSOC), type($ASSOC, $TYPE)?
Finding all associations between topic ‘x’ and topic ‘y’ is quite easy, though a bit more involved.
select $ASSOC from
role-player($ROLE1, x),
association-role($ASSOC, $ROLE1),
association-role($ASSOC, $ROLE2),
role-player($ROLE2, y)?
We can also find all occurrences of topic z quite easily, as shown below.
occurrence(z, $OCC)?
The /=
predicate allows an interesting class of queries. We can control which values must be the same in a query, simply by using the same variable, but we cannot force different variables to have different values. This means that if we try to do a query like “find all people born on the same day” we run into a quite subtle problem. Below is the naive way to approach it.
date-of-birth($PERSON1, $DATE),
date-of-birth($PERSON2, $DATE)?
This runs, but does not produce the result we want, as can be seen from the table below.
PERSON1 | DATE | PERSON2 |
---|---|---|
Benelli, Sem | 1877 (10 Aug) | Benelli, Sem |
Ghislanzoni, Antonio | 1824 (25 Nov) | Ghislanzoni, Antonio |
Coppée, François | 1842 (26 Jan) | Coppée, François |
Here tolog has discovered that if PERSON1
is ‘Benelli, Sam’, DATE
is ‘1877 (10 Aug)’, and PERSON2
is ‘Benelli, Sam’, then our query criteria are satisfied. In other words, tolog has found, as we asked it to, that every person has the same birthdate as himself. We knew this all along, of course, and did not want it in our query results at all, so what we do now is to specify that we are not interested in matches where there is only one person. This is done as below.
date-of-birth($PERSON1, $DATE),
date-of-birth($PERSON2, $DATE),
$PERSON1 /= $PERSON2?
This gives the result we wanted, which is shown below.
PERSON1 | DATE | PERSON2 |
---|---|---|
Bognasco, G. di | (unknown) | Lombardo, Carlo |
Bognasco, G. di | (unknown) | Franci, Arturo |
Bognasco, G. di | (unknown) | Vaucaire, Maurice |
Duveyrier, Charles | 1803 | Royer, Alphonse |
Royer, Alphonse | 1803 | Duveyrier, Charles |
… | … | … |
Again the results are somewhat disappointing. We see that there are indeed two people who share the specified birth information, but in their case we only know the year. We also see that a number of people all have their birth date given as "(unknown)"
and since all these values are the same they are also returned as query matches, though they are of course not guaranteed to have the same birth date.
Sometimes we don’t want all the results from a query, but only a limited set of results. For example, we may only want to know the answer to “who is the most prolific opera composer?” In this case, we only want the first row of the results, and not the rest, however many they may be. tolog supports this, through the LIMIT
keyword, as shown below.
select $A, count($B) from
composed-by($A : composer, $B : opera)
order by $B desc limit 1?
This gives the result (“Verdi, Guiseppe”, 28), but does not return any of the other rows. Note that the order by
is very important. Without this we would have produced the rows in random order, then only returned the first, which need not have been the row for the most prolific composer.
The LIMIT
keyword is mainly used as a performance optimization in cases where a large number of results can be returned. LIMIT
allows the application to tell the query processor the limit the number of results to a manageable number, and this can have significant performance benefits in some cases.
Another use for LIMIT
is when one wants to show a paged list. That is, a query is run to show first results 1-10, then, if the user wants to go further, 11-20, then, 21-30, and so on. tolog can do this with LIMIT
and OFFSET
together. Let’s say we want to show operas, and we only want to show the first page. We can then do as below.
instance-of($OPERA, opera)
order by $OPERA limit 10?
This gives us the first ten operas (note the order by
which guarantees a consistent ordering), and we can continue with the below to get operas 11-20.
instance-of($OPERA, opera)
order by $OPERA limit 10 offset 10?
This query starts on row 10 (the first one did rows 0-9) according to OFFSET
and stops after 10 rows, according to LIMIT
. By increasing OFFSET
to 20, 30, and so on we can continue stepping through the pages.
One thing that has been consistently glossed over so far is how topic identifiers like composer
actually refer to topics, and how to refer to topics by means such as their subject identifiers. Another issue we’ve ignored is how to actually make inference rule definitions available to the query processor. This section deals with both of those issues in more detail.
There are several different ways of referring to topics (and other topic map constructs, such as topic names, occurrences etc) in tolog, although so far we have only seen topic IDs. Below is a complete list of the different syntaxes.
#
. If the topic being looked up did not originate in the root document of the topic map (but instead came from a file that was merged into it), this lookup will fail.This allows us to for example write a query that in any topic map will find the topics at the top of the superclass-subclass hierarchy, as shown below.
select $TOP from
i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-subclass"(
$TOP : i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass",
$SUB : i"http://www.topicmaps.org/xtm/1.0/core.xtm#subclass"),
not(i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-subclass"(
$OTHER : i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass",
$TOP : i"http://www.topicmaps.org/xtm/1.0/core.xtm#subclass"))?
Since this query uses the subject identifiers for the association type and role types defined by XTM 1.0 this will work regardless of what the XML IDs given to the topics are in any particular topic map. This makes the query quite portable and robust. (Unfortunately, it also makes the query almost entirely unreadable, but more about that in the next section.)
As noted above, giving URIs in full throughout a query quickly makes the query so verbose as to be virtually unreadable. In order to solve this, tolog has a feature called URI prefixes, which allows the definition of an identifier prefix that can be combined with a local name to create a full URI. Using this we can rewrite the query from the previous section as shown below.
using xtm for i"http://www.topicmaps.org/xtm/1.0/core.xtm#"
select $TOP from
xtm:superclass-subclass($TOP : xtm:superclass, $SUB : xtm:subclass),
not(xtm:superclass-subclass($OTHER : xtm:superclass,
$TOP : xtm:subclass))?
This query defines the prefix xtm
which is bound to the URI "http://www.topicmaps.org/xtm/1.0/core.xtm#"
, and which is defined to be interpreted as subject identifier reference (because of the i
before the URI). Thus, xtm:subclass
is interpreted as i"http://www.topicmaps.org/xtm/1.0/core.xtm#subclass"
.
It is possible to define more than one prefix in a single query, and it is also possible to bind prefixes to a"..."
and s"..."
URIs.
Another thing we have glossed over so far is how to actually make an inference rule available for use in queries. As it turns out, there are several ways to do this, and we’ll walk through them one by one. The easiest is simply to put the rule into the query that uses it, like shown below.
influenced-by($A, $B) :- {
pupil-of($A : pupil, $B : teacher) |
composed-by($OPERA : opera, $A : composer),
based-on($OPERA : result, $WORK : source),
written-by($WORK : work, $B : writer)
}.
instance-of($COMPOSER, composer),
influenced-by($COMPOSER, $INFLUENCE),
born-in($INFLUENCE : person, $PLACE : place),
not(located-in($PLACE : containee, italy : container))?
However, having to repeat the rule for every query that uses it is not exactly a good code reuse strategy, so tolog also supports importing rule sets from files. If the influenced-by
rule is stored in the file opera.tl
in the same directory as the opera.ltm
topic map file the above query can also be written as shown below.
import "opera.tl" as opera
instance-of($COMPOSER, composer),
opera:influenced-by($COMPOSER, $INFLUENCE),
born-in($INFLUENCE : person, $PLACE : place),
not(located-in($PLACE : containee, italy : container))?
What happens here is that the import
statement imports the rule file, which is stored as a module bound to the opera
prefix. The inference rules in the file then turn into predicates in the module, and can be accessed through the opera
prefix.
References to module files are first attempted loaded from the CLASSPATH. If it was not found on the CLASSPATH then it is resolved relative to the URI of the topic map, except inside module files where it is resolved relative to the URI of the module file. (Module files can import other module files.)
One thing we haven’t looked at yet is what are known as scoping rules; that is, the rules for which declarations are visible where. Declarations in tolog are of two types: inference rule declarations, and prefix declarations (whether URI prefixes or module imports). The rules for all declarations are the same, and they are as listed below.
This section explains how you can run tolog queries with Ontopia.
The easiest way to get started with actually running tolog queries and seeing the query results is to use the tolog plug-in in the Omnigator. This plug-in appears as a “Query” link in the plug-in row of the Omnigator’s pages. Clicking on this link takes you to a page where you can write tolog queries in a form and see them evaluated. The results display as a table where each topic is linked back into the Omnigator.
The Navigator Framework contains a tag library named ‘tolog’, which consists of JSP tags specially designed for creating HTML output from topic maps using tolog. The tag library allows queries to be run against the topic map, and makes the resulting variable bindings available as parameters for new queries or to be output, etc. For more information on this, please consult The Navigator Framework — Developer’s Guide.
The tolog query processor also has a Java API, which can be used to evaluate queries and make use of the query results. The heart of this API is the QueryProcessorIF
interface, which represents the query processor. Query processors can parse queries and return objects representing them, and also take queries and return query results (represented by QueryResultIF
objects). These interfaces are all in the net.ontopia.topicmaps.query.core
package.
There are two tolog implementations in Ontopia. One of these works directly against the API, and thus works for all the topic map engine backends. However, it does not perform optimally for very large topic maps stored in RDBMSs, and so there is a special implementation for this backend. The QueryUtils
class in the net.ontopia.topicmaps.query.utils
package can be used to create query processors. It will automatically select the right processor for your topic map, and lets you write code that is independent of which backend you use.
The example below shows the implementation of the tolog plug-in in the Omnigator, which is written using the API. The example has been simplified somewhat, in order to make it more clear.
The tolog plug-in
<p>Query:<br> <%= query %></p>
<%
String query = request.getParameter("query");
QueryProcessor proc = QueryUtils.getQueryProcessor(topicmap);
QueryResultIF result = proc.execute(query);
StringifierIF str = TopicStringifiers.getDefaultStringifier();
%>
<table class="text">
<tr><%
String[] variables = result.getColumnNames();
for (int ix = 0; ix < variables.length; ix++)
out.write("<th>" + variables[ix]);
}
%></tr>
<%
Object[] row = new Object[result.getWidth()];
while (result.next()) {
out.write("<tr>");
result.getValues(row);
for (int ix = 0; ix < variables.length; ix++) {
if (row[ix] == null)
out.write("<td>null");
else if (row[ix] instanceof TopicIF)
out.write("<td><a href=\"../../models/topic_" + model + ".jsp?tm=" + tmid + "&id=" + ((TopicIF) row[ix]).getObjectId() + "\">" +
str.toString(row[ix]) + "</a>");
else
out.write("<td>" + row[ix]);
out.write(" \n");
}
out.write("</tr>");
}
result.close();
%>
</table>
Note that in order to avoid resource leaks it is necessary to close all result sets when using the RDBMS backend.
An easier interface for running queries in Ontopia is provided by the QueryWrapper
class in the net.ontopia.topicmaps.query.utils
package. This provides methods for running a query and getting a single topic or string, or a list of objects, and also has methods for converting a query result into a list of custom objects and so on. This class automatically takes care of closing result sets.
The query processor API supports parsing a query once and then running the parsed query many times. This is supported through the parse(String query)
method of the QueryProcessorIF
. This returns an object implementing ParsedQueryIF
, which can be executed many times.
Of course, executing the same query many times is not very interesting; one might as well just cache the results and achieve the same thing much more efficiently. However, tolog also supports parameters to parsed queries. These are written with the %foo%
syntax that we saw used with the tm:tolog
tag above.
The ParsedQueryIF
interface has two methods named execute
. One takes no arguments and effectively reruns the same query. The other takes a Map
argument which maps parameter names to their values. Using this, the same query can be run over and over again with different parameters without having to pay the cost of parsing and optimizing the query more than once.
In addition to querying the topic map to retrieve data tolog also supports making modifications to the topic map. This section explains how to use the DELETE
, UPDATE
, INSERT
, and MERGE
statements, in addition to the SELECT
statement covered so far.
Note that in all of these statement types, topic references and prefix declarations have exactly the same syntax as in SELECT
statements.
This statement is used to delete objects from the topic map. The simplest form of delete statements is as follows:
Deleting a topic
delete foo
This would delete the topic map object with the ID foo
. If it is a topic map everything in the topic map will be deleted. If it is an association, the association and all its roles will be deleted. If it’s a topic the topic and all its names, occurrences, and all the associations it plays roles in will be deleted. It will also be removed as a reifier, a topic type, and from scopes, and statements typed with the topic will also be deleted.
It’s possible to run a query to delete things, as follows:
Deleting all person topics
delete $PERSON from
instance-of($PERSON, person)
The from
part of the query is the same as for SELECT
statements, and so the above would find all topics of type PERSON
and delete them. Any number of variables and topic references are allowed in the delete
clause.
Finally, there is a third form of DELETE
statements, which calls a function to remove a value from a property. Here is an example, removing all Wikipedia PSIs from all topics in the topic map:
Removing Wikipedia PSIs
import "http://psi.ontopia.net/tolog/string/" as str
delete subject-identifier($TOPIC, $PSI) from
subject-identifier($TOPIC, $PSI),
str:starts-with($PSI, "http://en.wikipedia.org/wiki/")
The deletion functions available in tolog are:
Function | Parameters | Meaning |
---|---|---|
subject-identifier | topic, locator | Removes the locator as a subject identifier for the topic. |
subject-locator | topic, locator | Removes the locator as a subject locator for the topic. |
item-identifier | object, locator | Removes the locator as an item identifier for the object. |
direct-instance-of | topic, topic | Removes the second topic as an a topic type of the first topic. |
scope | statement, topic | Removes the topic from the scope of the statement. |
reifies | statement, topic | Removes the topic as a reifier of the statement. |
The INSERT
statement is used to add new information to the topic map. In its simplest form it looks like this:
Adding a new topic
INSERT
tolog-updates isa update-language;
- "tolog updates".
Basically, after the INSERT
keyword follows topic map data in CTM syntax. Note that it is possible to use the CTM wildcard syntax to create new topics without giving them explicit IDs in the CTM fragment. Prefixes declared for the tolog query can also be used in the CTM fragment.
There is another more powerful form of the INSERT
statement which runs a query and instantiates the CTM part once for each row in the query result. An example of this might be as follows:
Adding Wikipedia PSIs
import "http://psi.ontopia.net/tolog/string/" as str
insert $topic $psi . from
article-about($topic, $psi),
str:starts-with($psi, "http://en.wikipedia.org/wiki/")
For every article-about
occurrence which references a Wikipedia article this statement adds the URI of the article as a subject identifier of the topic.
The UPDATE
statement makes it possible to change the values of some properties in the topic map. The statement has two forms, of which the simplest is this:
Simple update
update value(@2312, "Ontopia")
The above statement sets the string value of the object with ID 2312
to "Ontopia"
. It is also possible to run a query to find the objects to change (and their values), as follows:
Query update
update value($TN, "Ontopia") from
topic-name(oks, $TN)
The available update functions are:
Function | Parameters | Meaning |
---|---|---|
value | object, string | Sets the string value of the object, which must be a topic name, variant name, or occurrence. |
resource | object, locator | Sets the value of the object to the given locator (and also changes the datatype to URI), where the object must be a variant name or an occurrence. |
The MERGE
statement can be used to merge topics either by running a query or directly, as in the example below.
Merging two topics
merge topic1, topic2
It’s also possible to find the topics to merge by running a query, as in the following:
Merging by email
merge $T1, $T2 from
email($T1, $EMAIL),
email($T2, $EMAIL)
Rows where T1
and T2
refer to the same topic are ignored.
The most detailed description of tolog can be found in tolog specification and the TMRA 2005 paper.
The Extending tolog conference paper from Extreme Markup 2003 has more background on why tolog was extended from version 0.1 to 1.0, and the design issues faced in that process.
The original paper on tolog is tolog - A topic map query language, from XML Europe 2001.
More about tolog updates can be found in the TMRA 2009 paper.