This document introduces the DB2TM module and explains how to use it to convert relational data, whether in CSV files or a relational database, into a topic map. It explains the XML configuration file format, the command-line tool, and the public API of the converter.
Ontopia 5.2 2011-09-27
The DB2TM module can be used to convert relational data into a topic map. It supports both CSV and JDBC data sources, and uses an XML configuration file to define the mapping from the relational source into the Topic Maps ontology you use. It can be accessed using either a Java API or a command-line tool.
The basic approach of the DB2TM module is that it takes an existing topic map (possibly empty), a configuration, and a set of relations (ie: tables), and converts the rows in the relations into data in the topic map. The mapping from the relations to the topic map is described by the configuration.
The configuration file contains a set of data sources (often just one), which can be either a database or a directory containing CSV files, and a set of relation declarations. The relation declarations define the mapping from each relation to the topic map ontology used. In addition, URI prefixes similar to those of LTM and tolog can be defined at the top of the configuration file for convenience.
This section explains how to use DB2TM by walking through how to set up the conversion process for an example conversion.
Let’s say we have a file called organizations.csv
, which contains the following:
ID;NAME;WEBSITE
1;Ontopia;http://www.ontopia.net
2;United Nations;http://www.un.org
3;Bouvet;http://www.bouvet.no
We can easily create a topic map of the data in this file with DB2TM, but to get a nice result in Omnigator we need a seed topic map containing the ontology. Since the CSV file only contains topics of a single type, with a name, and occurrences of a single type, the following LTM would work fine:
#PREFIX ex @"http://psi.example.org/"
[ex:organization = "Organization"]
[ex:homepage = "Homepage"]
The next thing we will need is an XML configuration file that describes the mapping from the CSV file into our ontology. A configuration file that only defines our data source would look as follows:
<db2tm name="demo">
<sources>
<csv path="./" ignoreFirstLines="1"/>
</sources>
</db2tm>
This defines a CSV data source named ‘files’ for CSV files contained in the current directory (that’s what ‘./’ means), and where the first line of every file will be ignored. Note that the path
attribute is always evaluated relative to the location of the XML configuration file. The next step is to define the mapping from the organizations.csv
file to the ontology shown above. However, to do that we need to add a prefix declaration at the top of the file, as follows:
<using prefix="ex" subject-identifier="http://psi.example.org/"/>
This one element says that the prefix ‘ex’ is a subject identifier prefix referring to topics whose PSIs begin with http://psi.example.org/
. With this in hand we can move on to the mapping of our one relation, which is done as follows:
<relation name="organizations.csv" columns="id name url">
<topic type="ex:organization">
<item-identifier>#org${id}</item-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${url}</occurrence>
</topic>
</relation>
This is the key part, so let’s walk through this slowly. The name
attribute on the relation
element gives the name of the CSV file. The columns
attribute contains the names we will use for the columns in the CSV file in order. The next line has the topic
element, which means that every row in the relation will map to a topic. The type
attribute gives the type of the topics (in this case: organization
).
The item-identifier
element is used to create an item identifier for the topic, and in this case it’s created from the first column in the file. (We do this to make it possible to refer to these topics in Associations.) The topic-name
element creates a topic name from the second column. The occurrence
element creates an occurrence of type ex:homepage
from the third column.
There is one problem with the mapping above, however. It doesn’t say that ex:homepage
is an external occurrence, and so the URL is just treated as a string. To solve this we need to declare the datatype to be a URI. The easiest way to do this is to define a subject identifier prefix for the XML Schema data types, like this:
<using prefix="xsd" subject-identifier="http://www.w3.org/2001/XMLSchema#"/>
Once this is in place we can add datatype="xsd:anyURI"
on the occurrence
element, and DB2TM will create external occurrences instead.
Now that we have assembled all the pieces, the next step is to run the conversion, and this is done by issuing the following command on the command-line:
java net.ontopia.topicmaps.db2tm.Execute --tm=ontology.ltm --out=tm.ltm add organizations.xml
The result is the following LTM:
[org1 : id1 = "Ontopia"]
{org1, id2, "http://www.ontopia.net"}
[org2 : id1 = "United Nations"]
{org2, id2, "http://www.un.org"}
[org3 : id1 = "Bouvet"]
{org3, id2, "http://www.bouvet.no"}
The id1
and id2
references are found in the LTM because the LTM exporter doesn’t have any IDs for these topics, just the PSIs, and it always refers to topics using ID.
So far we’ve mapped only one relation, and to get an interesting topic map we really need to handle more relations. The next one up is people.csv
, which looks as follows:
ID;GIVEN;FAMILY;EMPLOYER;PHONE
1;Lars Marius;Garshol;1;+47 90215550
2;Geir Ove;Grønmo;1;
3;Kofi;Annan;2;
The third column here is a foreign key reference to the organizations.csv
file. To handle this relation we need to extend the LTM seed ontology as follows:
[ex:person = "Person"]
[ex:employed-by = "Employed by"
= "Employs" / ex:employer]
[ex:employer = "Employer"]
[ex:employee = "Employee"]
[ex:phone = "Phone"]
Having done this we are ready to extend the configuration file with a new relation as follows:
<relation name="people.csv" columns="id given family employer phone">
<topic id="employer">
<item-identifier>#org${employer}</item-identifier>
</topic>
<topic type="ex:person">
<item-identifier>#person${id}</item-identifier>
<topic-name>${given} ${family}</topic-name>
<occurrence type="ex:phone">${phone}</occurrence>
<player atype="ex:employed-by" rtype="ex:employee">
<other rtype="ex:employer" player="#employer"/>
</player>
</topic>
</relation>
The first new thing to notice here is that we have two topic
elements inside the relation
element. The first is used to create the topic playing the employer role in the employed-by association with the person topic. It has an id
so we can refer to it, and an item-identifier
element so we know its identity. (Note that the content of the item-identifier
element is designed to match up with that in the mapping for organizations.)
The second topic
element is mostly straightforward. Notice how we combined two columns into a single topic name. Also note how we don’t need to specify a datatype for the phone occurrence type since it’s just a string. The player
element is used to create an association from this topic (of type employed-by
, where the person topic plays the role employee
). The other
element specifies the other role in the association (of type employer
), played by the topic we defined earlier with the id employer
.
And that’s it. The result is that we get the following extra LTM:
[person1 : person = "Lars Marius Garshol"]
{person1, phone, [[+47 90215550]]}
[person2 : person = "Geir Ove Grønmo"]
[person3 : person = "Kofi Annan"]
employed-by( org1 : employer, person1 : employee )
employed-by( org1 : employer, person2 : employee )
employed-by( org2 : employer, person3 : employee )
This section covers more advanced topics in DB2TM.
Some functionality of the DB2TM mapping was not covered in Tutorial, and so it will be briefly described here. For a full definition of the syntax of the mapping file, see the DB2TM RELAX-NG schema.
Scope is supported through the scope
attribute which can be used on the topic-name
, occurrence
, player
, and association
(see Mapping relations to associations) elements. An example of using the scope
attribute might be:
<relation name="nicknames.csv" columns="id nick">
<topic>
<item-identifier>#person${id}</item-identifier>
<topic-name scope="ex:nick">${nick}</topic-name>
</topic>
</relation>
Here we use the item-identifier
to match up with the topics from the people.csv
file, and the scope
attribute as explained above. Note that the scope
attribute supports multiple topic references separated by whitespace.
In some cases a relation does not map to topics of a particular type, but instead maps to associations of a particular type. An example of such a relation might be the offices-in.csv
file, which lists which countries the various organizations have offices in. The contents of this file are given below:
ORGANIZATION;COUNTRY
1;578
1;826
2;840
2;756
3;578
To map this file into the topic map we would use the following relation
mapping:
<relation name="offices-in.csv" columns="organization country">
<topic id="organization">
<item-identifier>#org${organization}</item-identifier>
</topic>
<topic id="country">
<subject-identifier
>http://psi.oasis-open.org/iso/3166/#${country}</subject-identifier>
</topic>
<association type="ex:has-offices-in">
<role type="ex:organization" player="#organization"/>
<role type="ex:country" player="#country"/>
</association>
</relation>
Note that it’s possible to use the same child elements inside association
as inside topic
. If these elements occur a topic reifying the association will be created, and the topic characteristics assigned to the reifying topic.
Sometimes the values in the database are not exactly as one would like them to be in the topic map, and for these cases DB2TM offers what we call “virtual columns”. These are essentially new columns created by mapping the values from an existing column either using a mapping table or a function.
The simplest way to map values is to use a mapping table, which is useful for handling special codes and IDs which you do not want in your database. Let’s say that the organizations.csv
file contained an extra column, like this:
ID;NAME;WEBSITE;TYPE
1;Ontopia;http://www.ontopia.net;C
2;United Nations;http://www.un.org;O
3;Bouvet;http://www.bouvet.no;C
The TYPE
column here tells us what type of organization we are dealing with, and the choices are O
(meaning organization) and C
(meaning company). However, these codes do not occur in the topic map, where these topic types are identified with the PSIs http://example.org/organization
and http://example.org/company
. So how to do the mapping? The solution is to use a mapping table that translates the codes into the correct typing topics, as shown below.
<relation name="organizations.csv" columns="id name url type">
<mapping-column column='type' name='typepsi'>
<map from='C' to='company'/>
<map from='O' to='organization'/>
</mapping-column>
<topic id='type'>
<subject-identifier>http://example.org/${typepsi}</subject-identifier>
</topic>
<topic type="#type">
<item-identifier>#org${id}</item-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${url}</occurrence>
</topic>
</relation>
This creates a new virtual column with the name typepsi
, which will contain company
if the value in type
is C
and organization
if it is O
. The mapping of the type
topic then converts this into the correct topic type, and the reference from the next topic
element makes a topic of the correct type.
Sometimes the mapping from the values you have to the values you want is more complicated, and in these cases you can use a Java method to do the mapping. Let’s imagine, for example, that the people maintaining the database of organizations don’t always bother to put in correct URLs, so that some of the rows are incorrect, as shown in the example below.
ID;NAME;WEBSITE
1;Ontopia;http://www.ontopia.net
2;United Nations;http://www.un.org
3;Bouvet;http://www.bouvet.no
4;OfficeNet;www.officenet.no
In this case, a mapping table is not going to help, but we can easily implement the correction we need in Java. All that’s needed is something like this:
package net.ontopia.utils;
// WARNING: this class does not actually exist in Ontopia!
public class ExampleUtils {
public static String correctURI(String baduri) {
if (!baduri.startsWith("http://"))
return "http://" + baduri;
else
return baduri; // not so bad, after all
}
}
We can now use this method (note that it is static) from the DB2TM configuration as follows:
<relation name="organizations.csv" columns="id name url">
<function-column name='goodurl'
method='net.ontopia.utils.ExampleUtils.correctURI'>
<param>${url}</param>
</function-column>
<topic id='type'>
<subject-identifier>http://example.org/${typepsi}</subject-identifier>
</topic>
<topic type="#type">
<item-identifier>#org${id}</item-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${goodurl}</occurrence>
</topic>
</relation>
This will produce a topic map where the topic for OfficeNet will have the URI http://www.officenet.no
. Note that it is possible to pass static parameters to the methods by simply giving the parameter values in the param
element instead of a column reference.
DB2TM currently supports two kinds of data sources: CSV files and JDBC connections. This section describes both in more detail.
CSV data sources are configured with the csv
element, which represents a directory containing a set of CSV files. The attributes accepted by this element are listed in the table below.
Attribute | Meaning | Required? |
---|---|---|
path | The path to the directory containing the CSV files. The path is relative to the location of the XML file. | Yes |
encoding | The character encoding of the CSV files. If not specified it defaults to the platform default. | No |
separator | The character separating entries in a line in a CSV file. Default is semicolon (;). | No |
quoting | The character used to quote text strings in the CSV files. Default is doublequote ("). | No |
ignoreFirstLines | The number of lines to ignore at the beginning of the file. Defaults to 0. | No |
Note that quoting characters in text strings are escaped by repeating them. In other words, if you use (") to quote your strings, and you want to put a " in a string, write it as "". Below is an example:
40;"Foo";"This is a ""real"" example"
JDBC data sources are configured with the jdbc
element, which represents a database containing a set of tables. The attributes accepted by this element are listed in the table below.
Attribute | Meaning | Required? |
---|---|---|
propfile | The path to the RDBMS properties file that tells DB2TM how to connect to the database. This property file uses the same set of properties as the Ontopia RDBMS backend does (except that it doesn’t make use of all the properties that tune the backend cache etc). The path is relative to the location of the XML file. (If no file is found, DB2TM tries to load it from the classpath.) | Yes |
Note that the JDBC source requires a number of things in order to work. The JDBC drivers you wish to use have to be on the classpath, the database server has to be running, the username/password has to be correct, etc.
In addition to the command-line interface DB2TM also has an API that can be used to run conversions and synchronizations in other contexts than on the command-line. This API is very simple and consists of the single class net.ontopia.topicmaps.db2tm.DB2TM
, which has two methods: add
and sync
. The add
method can run conversions given a string reference to a configuration file and a topic map. The sync
method can perform synchronizations given a string reference to a configuration file and a topic map to synchronize against.
A command-line Java program that runs the same conversion that we saw in Basics could be written as follows:
import net.ontopia.topicmaps.core.TopicMapIF;
import net.ontopia.topicmaps.utils.ImportExportUtils;
import net.ontopia.topicmaps.db2tm.DB2TM;
public class Convert {
public static void main(String argv[]) throws java.io.IOException {
TopicMapIF topicmap = ImportExportUtils.getReader("ontology.ltm").read();
DB2TM.add("organizations.xml", topicmap);
ImportExportUtils.getWriter("tm.ltm").write(topicmap);
}
}
Here is a simple command-line Java program that performs synchronization:
import net.ontopia.topicmaps.core.TopicMapIF;
import net.ontopia.topicmaps.utils.ImportExportUtils;
import net.ontopia.topicmaps.db2tm.DB2TM;
public class Synchronize {
public static void main(String argv[]) throws java.io.IOException {
TopicMapIF topicmap = ImportExportUtils.getReader("ontology.ltm").read();
DB2TM.sync("organizations.xml", topicmap);
ImportExportUtils.getWriter("tm.ltm").write(topicmap);
}
}
So far, the examples have shown only running DB2TM against topic maps stored in files. In practice, most topic maps will be stored using the RDBMS backend, and so we will here show how to use it with such topic maps. When using the RDBMS backend, there are two pieces of information that the command-line tool needs:
There is a special URI syntax that can be used in place of a file reference to provide the topic map ID, and a system property which can be used to reference the RDBMS properties file. Using both would look like this:
java -Xmx512M -Dnet.ontopia.topicmaps.impl.rdbms.PropertyFile=rdbms.props net.ontopia.topicmaps.db2tm.Execute --tm=x-ontopia:tm-rdbms:id sync mapping-file.xml
So far, we have only discussed how to use DB2TM for conversion, but if the data source you are converting from is going to continue to be updated, then this is not very interesting. However, after the initial conversion DB2TM can also update your topic map so that it remains in sync with the data source.
Let’s say that we change the LTM topic map we produced in Basics to the following:
[org1 : id1 = "Ontopia"]
{org1, id2, "http://www.ontopia.net"}
located-in(ontopia : located, oslo : location)
[org2 : id1 = "United Nations"]
{org2, id2, "http://www.un.org"}
[org3 : id1 = "Bouvet"]
{org3, id2, "http://www.bouvet.no"}
The change here is that we added an association for Ontopia. At the same time, we’ve changed the CSV file as follows:
ID;NAME;WEBSITE
1;Ontopia AS;http://www.ontopia.net
3;Bouvet;http://www.bouvet.no
4;OfficeNet;http://www.officenet.no
Here we have deleted the United Nations, added OfficeNet, and changed Ontopia’s name to end in “AS”. We now run DB2TM with this command:
java net.ontopia.topicmaps.db2tm.Execute --tm=tm.ltm --out=sync.ltm sync organizations.xml
The result is the LTM fragment below:
[org1 : id1 = "Ontopia AS"]
{org1, id2, [[http://www.ontopia.net]]}
located-in(org1 : located, oslo : location )
[org3 : id1 = "Bouvet"]
{org3, id2, [[http://www.bouvet.no]]}
[org4 : id1 = "OfficeNet"]
{org4, id2, [[http://www.officenet.no]]}
As you can see, we have gotten all the changes in the CSV file into the topic map without losing any of the changes we made in the LTM. This is because DB2TM only synchronizes the topics and characteristics it has mappings for, and it leaves everything else alone. There is one problem, however, which we’ll discuss in the next section.
A problem with the approach taken in the previous section is that it requires reading the entire table when synchronizing. Of course, when there are only 3 rows, this isn’t much of a problem, but when there are, say, 50,000 rows, things are rather different. Rescanning the entire table gets costly in this case, and so DB2TM provides an alternative approach, which is to use a changelog table.
A changelog table is basically a table that contains an entry for every change made to the master table. For DB2TM to be able to make use of the table it has to contain a primary key reference to the changed row in the master table, and a column by which the changes can be ordered. It can contain other information as well, but DB2TM will ignore it.
Let’s assume that we want to repeat what we did in the previous section, but this time using a changelog table. To do that we need to make a new CSV file which describes the changes. This file might look as follows:
ID;CHANGE;ORDER
2;D;1
1;C;2
4;A;3
The first row here refers to the row with ID 2 from Basics (that is, to the United Nations row), and the second column says it was deleted. The last column is there to tell us in what order the changes happen. We are using numbers here, but timestamps would work just as well, as long as we can sort on the values. The second row refers to Ontopia, and says the row was changed somehow. (The name changed.) Finally, the third row says that a new row with ID 4 has been added. This gives DB2TM the information it needs to do the updates correctly without reading the entire table. (Note that the CHANGE
column is ignored, since version 5.2.0.)
So what would we need to do to make use of this? The only thing we need to change is that we have to add the following inside the relation
element, at the very end:
<changelog table="organization-changes.csv"
primary-key="ID"
order-column="ORDER"/>
This element tells DB2TM where the changelog is, which column contains the primary key reference, and what column to order the changes by.
Running the sync again would give the same result as in the previous section, except that changelog tables are not supported for the CSV data source. We have used CSV to explain it here for simplicity, but currently you need to use a database for this to work. (The rationale is that the speedup for a CSV file is very limited, since one has to read the entire CSV file even if only a few rows have changed.)
Some systems use a single changelog table for all tables, which causes difficulties, because it means that not all rows are relevant for each table. Further, it often means that primary key values are encoded in a single column and need to be parsed.
Let’s assume that we have a changelog table with the following columns:
colname1=value|colname2=value|...
To be able to set up a changelog for the organization table we need to fix both issues. This can be done as follows:
<changelog table="organization-changes.csv"
primary-key="PARSED_ID"
order-column="TIMESTAMP"
condition="table='organization'">
<expression-column name="PARSED_ID">
<!-- turns 'columnname=value' into 'value' -->
substring(KEYS, position('=', KEYS) + 1)
</expression-column>
</changelog>
The condition
attribute filters out changes to tables other than the one we’re interested in. The expression-column
takes care of the parsing of the primary key values so that DB2TM can join the changelog table with the data table.
Let’s say we had another table with information about our organizations, for example references to external web sites that have articles about them. Each row in this table would map to an article-about
occurrence, and the table might look like this:
ID;URL
1;http://www.ligent.net/company.jsp?id=107&bundle=162
1;http://www.techquila.com/topicmaps/tmworld/11770.html
1;http://www.knowledge-synergy.com/partner/partner.html#ontopia
2;http://en.wikipedia.org/wiki/United_Nations
2;http://topics.nytimes.com/top/reference/timestopics/organizations/u/united_nations/index.html?inline=nyt-org
Doing a conversion from this would be straightforward, but on doing a sync we would run into problems. There are no articles about Bouvet in the table, and so DB2TM would delete Bouvet from the topic map, since the row for Bouvet is gone. Of course, this is wrong, but DB2TM does not know that the main organization table is where organizations are stored, and not here. So we need to tell it, and this is what the primary
attribute is for. Using that, the mapping would look as follows:
<relation name="articles.csv" columns="id url">
<topic primary="false">
<item-identifier>#org${id}</item-identifier>
<occurrence type="ex:article-about">${url}</occurrence>
</topic>
</relation>
In some cases the table you are mapping may not contain all topics of the type they map to, and in this case the rescanning strategy can get it wrong. Let’s say that, for whatever reason, in addition to the organizations coming from the CSV file we need to add some manually. If we did this, the first sync would delete all the manually added organizations, because they are not found in the CSV file, and so the rescan assumes they have been deleted.
To solve this, you need to tell DB2TM which topics in the topic map were created from the CSV file. One way to do this is to include a unary association in the mapping that says the organization comes from the database. This would mean that every topic created from the CSV file would look like this:
[ontopia : organization = "Ontopia"]
stored-in-db(ontopia : stored)
This is enough that we could distinguish these topics from the manually added ones, if we write the mapping like this:
<topic type="ex:organization">
<extent query="direct-instance-of($T, ex:organization),
ex:stored-in-db($T : ex:stored)?"/>
<item-identifier>#org${id}</item-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${url}</occurrence>
<!-- static unary marker association -->
<player atype="ex:stored-in-db" rtype="ex:stored"/>
</topic>
The extent
element defines a query that produces all topics in the topic map created from this table mapping. This enables DB2TM to recognize manually created organization topics, and not delete them.
In general it is not recommended to use the command-line tool to synchronize topic maps while a web application is running, since Ontopia’s object cache inside the web application will not notice changes made by the command-line tool. (This will change once clustering support arrives.) Instead, the synchronization should be run from inside the web server. There are three ways to do this:
DB2TM can produce a lot of logging information if you tell it to do so. Set the net.ontopia.topicmaps.db2tm
logging category to the DEBUG
logging level. Most of the logging messages is produced by the net.ontopia.topicmaps.db2tm.Processor
class. A lot of logging is made at this level can be vast because of the huge number of modications typically made to the topic map during processing.
The information logged is a little terse and can be a little hard to interpret. The table below is an attempt at explaining what each of the messages mean.
Log message | Description |
---|---|
Adding tuples from data source <datasource> | Relations and their tuples are being read from the given data source. |
ignoring relation: <relation> | Ignored relation as there was no <relation> definition for it. |
adding relation: <relation> | Reading tuples from the given relation and adding them according to its <relation> definition. |
removing relation: <relation> | Reading tuples from the given relation and removing them according to its <relation> definition. |
synchronizing relation: <relation> type: <synchronization-type> | Reading tuples from the given relation and synchronizing them according to its <relation> definition. The synchronization type can be either 2 (rescan) or 4 (changelog). |
<number-of-tuples> tuples, <elapsed-time> ms. | Processed the given number of tuples in the given number of milliseconds. |
done: <number-of-tuples>, <elapsed-time> ms. | Processed the given number of tuples from all relations in the data source in the given number of milliseconds. |
Log message | Description |
---|---|
a(1|2|3) | Adds a table row containing three values into the topic map. The tuple will be processed according to the rules of the current <relation> definition. |
r(1|2|3) | Removes a table row containing three values into the topic map. The tuple will be processed according to the rules of the current <relation> definition. |
u(1|2|3) | Updating a table row containing three values against the topic map. The tuple will be processed according to the rules of the current <relation> definition. |
Log message | Description |
---|---|
-T <topic> | Removing a topic. |
>T <topic> | Removing characteristics defined in this <relation> definition from a non-primary topic entity. This will leave any other characteristics alone. |
+A <association> <association-type> | Added the association of the given type. |
=A <association> | The association already existed, so instead of creating a new one this one was reused. |
-A <association> <association-type> | Removing an association. |
+P <association> <association-type> | Added the association of the given type (via <player> defintion). |
=P <association> | The association already existed, so instead of creating a new one this one was reused (via <player> defintion). |
-P <association> : <association-type> | Removing an association (via <player> defintion). |
>A <association> | Association is not primary entity, so we can’t remove it here. |
+R <role-player> : <role-type> | Added an association role with the given role player and role type. |
=R <role-player> : <role-type> | The association role already existed, so instead of creating a new one this one was reused. |
-R <role-player> : <role-type> | Removed an association role with the given role player and role type. The association role belongs to the association logged a little earlier. |
?R <role-player> : <role-type> | Could not add role because role player retrieved from tuple was null. |
-A-reified <topic> -> <association> <association-type> | Removed the given association and its reifier topic. |
+N <topic> <topic-name> | Added the topic name. |
=N <topic> : <topic-name> | The topic name already existed, so instead of creating a new one this one was reused. |
-N <topic> : <topic-name> | Removing a topic name. |
+O <topic> <occurrence> | Added the occurrence. |
=O <topic> : <occurrence> | The occurrence already existed, so instead of creating a new one this one was reused. |
-O <topic> : <occurrence> | Removing an occurrence. |
Lets say that we have a table called organizations
, with the mapping below, and that we have already added the contents of this relation into the topic map.
<relation name="organizations" columns="id name url">
<topic type="ex:organization">
<subject-identifier>http://example.org/organization/${id}</subject-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${url}</occurrence>
</topic>
</relation>
This means that all the data in the relation have already been mapped to topics and characteristics in the topic map. The problem is now that we cannot just add a new column to the mapping as DB2TM would not be able to detect that a new column had been added. DB2TM will not detect the new column and the data in the column will not be added to the topic map before the individual rows in gets updated and the row contents is reread through synchronization.
Fortunately there is a way to get DB2TM to read the content of the new column. To do this we need to create a new temporary mapping file with just this single relation. We’ll use this new mapping file just for the purpose of adding the characteristic(s) mapped to the new column.
Lets say that the organizations
table has a new column called phone
and that we want to map this column to an internal occurrence of type ex:phone
. The organizations
table has one row per organization, so there is a topic
element mapping these organizations to topics in the topic map. The temporary mapping file also needs this element, but this time with only two fields: one field to identify the topic and one field to add the new characteristic. In our case the relation mapping should look like this:
<relation name="organizations" columns="id name url phone">
<topic>
<subject-identifier>http://example.org/organization/${id}</subject-identifier>
<occurrence type="ex:phone">${phone}</occurrence>
</topic>
</relation>
Note that the new phone
is now in the list of table columns. We now also have a subject-identifier
field so that we can look up the existing topic, and finally the occurrence
field to add the new internal occurrence.
The prefix declarations and the data sources can be the same as in the original mapping file, so just copy and paste those from the original mapping file.
Warning Before we proceed make sure that you shut down your application so that cache inconsistency issues do not arise.
We can now have DB2TM add the contents of the new column to the topic map by running the following command.
java net.ontopia.topicmaps.db2tm.Execute --relations=organizations add temporary.xml
The organizations in the topic map should now have their respective phone numbers added. The next thing we need to do is update the original mapping file. Add the new phone
column to the list of table columns and the new occurrence
field to the relation mapping.
<relation name="organizations" columns="id name url phone">
<topic type="ex:organization">
<subject-identifier>http://example.org/organization/${id}</subject-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${url}</occurrence>
<occurrence type="ex:phone">${phone}</occurrence>
</topic>
</relation>
We are now done with the temporary mapping file and it can be discarded. The application can now be restarted.
The net.ontopia.topicmaps.cmdlineutils.TologDelete
command-line utility is very useful for this purpose. Just hand it a tolog query and it will delete the objects returned by the query.
Alternatively, you can use tolog updates in the query plug-in in Omnigator, where you just run a DELETE statement to remove whatever you want to remove.
Use the tolog delete utility described above.
Doing a rescan
or add
on the relation together with the tolog delete utility mentioned above will allow you to repair most mistakes. Note that you can override the synchronization type without modifying the mapping file by passing the --force-rescan
option to the net.ontopia.topicmaps.db2tm.Execute
command line utility.