What can linked data do for the enterprise, can it solve the CIO’s headaches around data integration problems? That topic comes up more and more often in the linked data community. Where does Linked Data fit into the enterprise? Let’s explore that by looking at conventional enterprise data integration first.
I stumbled upon a blog post describing the challenges of providing a single view of enterprise data sources. That post and the previous of its series describe typical corporate IT: SOAs, SAPs, IBMs, multi-million dollar integration projects, legacy systems all over the place, tons of code for special purpose application integration, etc. The usual ecosystem that has grown over decades, which from the outside might appear like a mess, but can not be replaced or consolidated. Because consolidating all those systems would be unaffordable, and might sometimes be politically difficult to pursue, because for each individual department and applications, things mostly work well enough the way they are. It’s the cross-application integration (and hence often cross-department) integration that causes headaches.
My friends at the BBC and I wrote a paper describing the challenges at the BBC and how Linked Data can help (read chapter one). Enterprise-wide IT system consolidation is not an option. The desired solution is something on top of existing systems that provides integrated views of all data repositories and workflows without the risk of breaking existing applications.
The blog post above describes a case of “small pieces of information about customers [being] littered throughout the data center”, in the billing system, marketing system, CRM system, support system, etc. The suggested solution is data virtualization (or federation): A software that allowes users to define aggregated views over the different pieces of data, and analysts and application developers query those views instead of querying the multiple different data sources directly. The integration layer acts as a middleware and retrieves the information from these disparate systems real time when requested. Whether or not to virtualize or materialize the integration layer (i.e. whether data from sources is retrieved at query time, or gets replicated into a central repository) depends on the concrete case. Complex data joins are difficult to virtualize, and query execution time is often much worse than in materialization systems. Also, in virtualized views there is the problem of availability: If one data source is unavailable, your queries will not execute, so you’d want proxies anyway.
The power of these integration layers is that they separate the business logic from the data sources. IT staff can focus on the data they need and don’t have to deal with the different systems that data is stored. They can start glueing data together instead of glueing systems together in a point-to-point manner.
Now, where does Linked Data fit in? Linked Data enables you to push the integration down to the level of the actual data. Think of it is a network (or web) of all the little pieces of information: One customer in your CRM system links to her doppelgänger in the billing systems. So the data object about Anna Smith in the customer database links to the according Anna Smith in the billing system. And that Anna Smith links to her doppelgänger in the support system, and so on. Applications can follow these links through the different systems and that way get all the data about Anna Smith. The beauty of that? The links do not have to stop at your firewall. Your data objects can link to data sources on the Web or to your suppliers as well.You can link to whatever other Linked Data source you want, the technical barriers disappear.
Disambiguity is another important aspect. There are probably many Anna Smiths in any large enterprise customer database. With linked data, the objects become unambiguous, like a database IDs. Anna Smith lives in Cambridge? There are unambiguous IDs for the Cambridge in Massachusetts and the Cambridge in England, so Anna can link to the correct one unambiguously. And if you use a link to a linked data source on the web, like Geonames, your applications can fetch information about Cambridge from there. Once the disambiguation is done, and links between objects are established, they are available for all your applications to use.
All that makes data integration much more lightweight and agile, and at the same time much more powerful. And your integration layer software can do much more clever things in a more agile way. Is there still a need for that integration layer? Yes, there is. The integration layer becomes the place where the links between data objects get managed, where data collections get curated, where it gets defined which data sources and pieces of information to trust for which use case, where data collections are built, and where the data from all your enterprise and web data sources gets consolidated. Providing the single point of access into the web of data that exists in your enterprise and on the Web.
Browse Timeline
blog comments powered by Disqus

Comments ( View Comments )
[...] This post was mentioned on Twitter by Georgi Kobilarov, Richard Cyganiak, Uwe Stoll, Peter Haase, Uwe Stoll and others. Uwe Stoll said: RT @gkob: new blog post: "Linked Data and Enterprise Data Integration" http://bit.ly/b1bGQn #enterprise #linkeddata [...]
Tweets that mention Georgi Kobilarov » Linked Data and Enterprise Data Integration -- Topsy.com added these pithy words on Mar 24 10 at 5:02 pm[...] Georgi Kobilarov » Linked Data and Enterprise Data Integration – Interesting angle on linked data – i.e. joing up the data silos within the enterprise. [...]
Communities and Collaboration » Bookmarks for March 31st through April 5th added these pithy words on Apr 05 10 at 12:05 pmAdd a Comment