One of the bigger changes of the upcoming DBpedia 3.4 release is the ontology’s new URI schema: Property URIs are now partitioned by the property’s domain. While before it was http://dbpedia.org/ontology/artitect, now it is http://dbpedia.org/ontology/Building/artitect. In the past, there’s been the statement that http://dbpedia.org/ontology/architect has the rdfs:domain http://dbpedia.org/ontology/Building, now this fact is in addition also coded into the URI.

Looks ok? Maybe on the first sight. But in my opinion, it’s a big mistake.

Let’s first look at the reasons for that change in the URI schema. It aims to provide a solution for semantically ambiguous properties. For example, the word “length” can be used to describe the long dimension of an object, like the length of a bridge. But it can be also used as a synonym for the runtime of a song or movie (like 90 minutes). Now, with only the one URI http://dbpedia.org/ontology/length, it’s unclear whether the range of that property is measured in metres or minutes (let alone inch, feet, and miles ;) ) So in order to properly represent the two different semantics, we need two different URIs. Consensus so far…

Now there are to possible solutions: Either you use two different property IDs (such as length and movie_length), or you use two different namespaces. The DBpedia team chose the latter. The problem is that they did the partitioning for every single property, even those unambiguous ones. And since the DBpedia ontology wasn’t entirely carefully designed upfront, but is instead due to community refinement, that leaves us with URIs that will most probably break in the future.

See for example http://dbpedia.org/ontology/ceo. Its domain is http://dbpedia.org/ontology/SoccerClub, which seems kind of strange, but is due to the way how the ontology was created: The Infobox Soccer Club was probably the only one with a ceo property, so the domain of http://dbpedia.org/ontology/ceo became SoccerClub. Clearly, once the community starts refining the ontology, the domain will be changed to Company or something similar more reasonable. That wouldn’t be much of deal since we’d only see a changed statement about the ceo property. But with the new URI schema, the SoccerClub is part of the property’s URI (http://dbpedia.org/ontology/SoccerClub/ceo). We have a problem…

If we’d change the property URI to http://dbpedia.org/ontology/Company/ceo now, all existing queries will break. If we leave the URI as it is, the encoded SoccerClub becomes misleading. It would have been even better to use http://dbpedia.org/ontology/abc123/ceo instead, since that string doesn’t suppose to have any meaning.

The simple, straightforward solution would have been: use different IDs to disambiguate properties which actually need to be disambiguated. Or to do it like Freebase and partition very carefully by topic…

Share this: