One of the bigger changes of the upcoming DBpedia 3.4 release is the ontology’s new URI schema: Property URIs are now partitioned by the property’s domain. While before it was http://dbpedia.org/ontology/artitect, now it is http://dbpedia.org/ontology/Building/artitect. In the past, there’s been the statement that http://dbpedia.org/ontology/architect has the rdfs:domain http://dbpedia.org/ontology/Building, now this fact is in addition also coded into the URI.

Looks ok? Maybe on the first sight. But in my opinion, it’s a big mistake.

Let’s first look at the reasons for that change in the URI schema. It aims to provide a solution for semantically ambiguous properties. For example, the word “length” can be used to describe the long dimension of an object, like the length of a bridge. But it can be also used as a synonym for the runtime of a song or movie (like 90 minutes). Now, with only the one URI http://dbpedia.org/ontology/length, it’s unclear whether the range of that property is measured in metres or minutes (let alone inch, feet, and miles ;) ) So in order to properly represent the two different semantics, we need two different URIs. Consensus so far…

Now there are to possible solutions: Either you use two different property IDs (such as length and movie_length), or you use two different namespaces. The DBpedia team chose the latter. The problem is that they did the partitioning for every single property, even those unambiguous ones. And since the DBpedia ontology wasn’t entirely carefully designed upfront, but is instead due to community refinement, that leaves us with URIs that will most probably break in the future.

See for example http://dbpedia.org/ontology/ceo. Its domain is http://dbpedia.org/ontology/SoccerClub, which seems kind of strange, but is due to the way how the ontology was created: The Infobox Soccer Club was probably the only one with a ceo property, so the domain of http://dbpedia.org/ontology/ceo became SoccerClub. Clearly, once the community starts refining the ontology, the domain will be changed to Company or something similar more reasonable. That wouldn’t be much of deal since we’d only see a changed statement about the ceo property. But with the new URI schema, the SoccerClub is part of the property’s URI (http://dbpedia.org/ontology/SoccerClub/ceo). We have a problem…

If we’d change the property URI to http://dbpedia.org/ontology/Company/ceo now, all existing queries will break. If we leave the URI as it is, the encoded SoccerClub becomes misleading. It would have been even better to use http://dbpedia.org/ontology/abc123/ceo instead, since that string doesn’t suppose to have any meaning.

The simple, straightforward solution would have been: use different IDs to disambiguate properties which actually need to be disambiguated. Or to do it like Freebase and partition very carefully by topic…

Share this:

Subscribe to comments Comment | Trackback |
Post Tags:

Browse Timeline


Comments ( View Comments )

I had a hunch you were right when you tweeted a link to the DBpedia page, and your analysis confirms it. Why have they not made references to standard “upper” types of term from existing standards, or at least made a top down upper ontology which anticipates such standard terms? It seems to me there can’t be a very good theory of semantics behind this.

Mike Bennett added these pithy words on Nov 06 09 at 8:14 pm

Georgi, these are very valid points. What do you think about using entirely opaque identifiers? These bear not the danger that any kind of interpretation is attached to them, and all descriptive data (including UI things like labels) can be attached to them anyway.

Bernhard Schandl added these pithy words on Nov 06 09 at 8:46 pm

Bernhard, I don’t think an ontology needs opaque identifiers. Opaque IDs are useful in environments in which the costs of upfront consensus are too high. Like e.g. which city Cambridge gets the “Cambridge” ID, which one gets the “Cambridge,_the_other_one” ID, and what to do when a third Cambridge turns up at some point.

The thing about ontogies is that they get designed. And ontologies are useless if they are not carefully designed. So if we already put the effort into carefully designing the ontology, we can also make our users life easier by giving “speaking” IDs to concepts. That “speaking” ID has to carry that bit of semantics which we know to be stable, but must not carry any semantic that is due to change.

Sometimes it seems that the Linking Open Data community doesn’t care about ontologies and structure and semantics. But Linked Data is useless without structure…

Georgi added these pithy words on Nov 07 09 at 12:00 am

None of the former used SPARQL queries will actually break. The former schema exists as it was. So there is the property http://dbpedia.org/ontology/architect, as well as http://dbpedia.org/ontology/Building/architect which is a subProperty of the first. There are a lot of advantages of this as we see it. For example if you have datatype properties containing unit values, you can define a standard unit for each of the subProperties, which then will help user interfaces to display the data. We will blog about the changes made early next week.
But for now: you can query the data as you did before.

Anja Jentzsch added these pithy words on Nov 07 09 at 10:17 am

With a wild, evolving datasource like wikipedia, a fine-grained approach such as this sounds not tooooo bad. Its a tradeoff, we can’t manually create the ontology, but have to create it vastly automated based on the infoboxes. In my opinion, the key would be to do analysis and data quality on top, like YAGO does. But besides my own view, I trust Bernhard on his judgement.

Leo Sauermann added these pithy words on Nov 07 09 at 10:55 am

Anja, the thing is that in my opinion the old ontology schema should be fixed. instead, you introduce a new one that not only doesn’t fix the problem but creates new problems.

Georgi added these pithy words on Nov 09 09 at 12:29 pm

Leo, the dbpedia ontology isn’t created automatically. it was built manually from scratch. if it would be an automatic approach i wouldn’t complain, but for something that gets manually designed i think it’s valid to ask for some diligence…

Georgi added these pithy words on Nov 09 09 at 12:32 pm

I was trying to run a SPARQL query on DBpedia that returns the US presidents’ ages at inauguration (http://bit.ly/2zgXjJ). The queries listed there do not work anymore. In this case DBpedia seems to have lost precision in the extraction of the infoboxes. Before the tag “” was available and now it does not seem available anymore. We only have “” which is more vague because it can refer the presidency or vice-presidency function.

Conclusion, the actual DBpedia, in this case, seems less precise.

François Jean added these pithy words on Nov 10 09 at 11:40 pm

In my previous message, the tags were filtered-out. What I was trying to say is that before DBpedia had:
http://dbpedia.org/property/presidentStart
and now, it seems to only have:
http://dbpedia.org/property/termStart
which does not only covered the presidency start.

François Jean added these pithy words on Nov 11 09 at 2:59 pm

Add a Comment


XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus

© Copyright 2010 Georgi Kobilarov . Thanks for visiting!