October 25, 2022

Linked Data: Everything you need to know

Linked Data is quietly revolutionizing the way the world uses data for decision making and innovation. Every size of enterprise can gain something from this approach; from large corporations and governmental institutions to agile startups. So, what is linked data, and how can it be used?

As a simple definition, linked data is data that is structured to form logical connections with other data points, based on a shared relevance. Instead of using a ‘standard’ tabulated data format that can be thought of as rows and columns, each data point is connected by a semantic structure. This structure makes it possible for machines to read and process the data in a meaningful way.

Data varies a lot, so to keep linked data within some common, basic parameters it relies on four, core design principles that must be fulfilled for it to be considered ‘linked’.

The four design principles of linked data are:

  1. Uniform Resource Identifier (URI) to give each resource a unique identity. This is a unique series of characters such as a unique serial number, or a Uniform Resource Locator (URL), also known as a web address.
  2. Use of Hypertext Transfer Protocol (HTTP) or URL URIs to make resources easily available. As the internet has become a standard part of everyday life with many connected devices and data sources, URL URIs make it easier for data resources to be shared and accessed.
  3. Metadata and the use of Resource Description Framework (RDF) to ‘label’ data in a meaningful way. This enables OWL or SPARQL queries to be used to extract insights.
  4. Use of links to other URIs to connect data (with a linked data platform). This is the final part that makes the data truly ‘linked’. Instead of isolated datapoints, each entity is part of a neural web. This can be used to generate knowledge graphs and to map out connections that are not immediately apparent.

What is RDF and How Does it Relate to Linked Data?

RDF is a standard data model that defines and connects related resources through ‘triples’. Each triple can be thought of as a simple sentence that is encoded in a standard way.

A triple consists of a subject, predicate, and object.  This turns a meaningless data point into a ‘fact’ that has context and is linked to other related facts with meaning.  

By using RDF, data managers can build-in the most basic components of logic and significance into the data itself.

Converting your data into an RDF-based data model is a long-term investment that pays big dividends further down the road. With a relational data model, your data can be used effectively in the future with minimal difficulty. Putting your data into this format from the beginning turns it into an easily digestible form that can be readily absorbed by machine learning algorithms as they grow and learn.

What’s the difference between Linked Data and Linked Open Data?

There is sometimes a little confusion between linked data and linked open data. The reason is that a lot of open data projects are also linked data ones, as these have the most utility.

Open Data is easily defined as: “data that is publicly available and free to use without restriction under a ‘free use’ license.”

Linked Open Data (LOD) is simply linked data that has been made freely available and ‘open to use’ by anyone. Linked open data is perfectly described by Tim Berners-Lee (one of the chief architects of the World Wide Web) as being: “Linked Data which is released under an open license, which does not impede its reuse for free.”

LOD projects are varied and are often maintained by large academic or governmental institutions. These open data projects have a vital role in the development of AI as these massive datasets are perfect for training machine learning algorithms.

Thanks to the large datasets containing many connections, the result is faster learning outcomes and pattern discovery. When multiple datasets are combined the benefit is even greater. However, a uniform data model is essential when different data sources are combined.

There are some notable examples of LOD projects, including:

DBpedia – A project which converts the information available in Wikipedia into a structured database that is easily read by machines.

GeoNames – A massive geographical database with 11,000,000+ placenames already included.

Global Research Identifier Database (GRID) – This long-running project compiles data about global research and academic organizations, including grant information. Since 2021, this is now included as part of the Research Organization Registry (ROR) project.

BabelNet – This is a perfect example of the power of linked data from multiple sources. By combining the largest lexical database WordNet with Wikipedia data, BabelNet can link words with the things they represent, and use this to create a multilingual linked data dictionary that can be used to train multilingual Natural Language Processing (NLP) algorithms, among other things.

How is Linked Data used?

An interesting facet of linked data is that the vocabularies or ontologies used are not assumed to be complete. This leaves them open to new interpretations and end-uses. A perfect example is BabelNet (above), which takes linked data from multiple sources and uses it to create a new layer of knowledge based on the connections.

Because data resources are linked and not isolated, the relationships can be used to add new meaning to existing facts based on discovering trends, patterns, and correlations in the connections. Data experts can use these to infer entirely new knowledge from existing facts using specific queries.

Linked data is rich and meaningful, making it the perfect diet for machine learning and pattern recognition in supermassive linked datasets. Using smart algorithms to identify patterns in big data, researchers are discovering new drugs, as well as predicting or preventing disease. Practical appliations range from smartphone apps that can identify skin conditions from a photo, programs that can diagnose heart conditions, and identify cancer automatically from MRIs, often when this is missed by experienced diagnostic professionals.

A Linked Data Strategy Powers Innovation

Linked data relies on having a big vision for how investing in an RDF-based linked data model can deliver returns much later on.

As well as the beneficial uses mentioned above, there are solid business reasons for making this investment. When data is semantically linked it is easier to build integrations, for example. This keeps data in an ‘always fresh’ state; it’s always ready to be used instead of becoming siloed and disconnected from decision-making processes by being trapped in isolated databases, applications, and disparate formats.

As more businesses are becoming aware of the value of their data, a linked data strategy will become a standard model for organizations to accrue and publish data. It paves the way for more opportunities possible in future.

 

Some Notable Linked Data Use Cases in Industry and Business

As we’ve already touched upon, many of the better-known linked data use cases are in the non-profit, academic ,or public sector. There are, however, numerous ‘behind the scenes’ instances in which commercial organizations use linked data to pull a profit and gain a lead over the competition.

These linked data pioneers include the ever-present Netflix, who employ their linked data to help create new content and understand user behavior, and of course Amazon, and the associated virtual assistant Alexa which use their linked data to serve the most relevant answers to queries via text search or voice queries using NLP (which also relies on linked data to understand what you’re asking).

Of course, perhaps the best-known example of linked data is the Google Knowledge Graph, which is used to power their ‘things not strings’ approach. This is designed to transform their search engine into an ‘answer engine’ that produces intelligent responses based on intent and an understanding of the entities you want to know about.

Likewise, social media platforms like Facebook use an Open Graph protocol to create a linked data database that tracks people, their connections, and the content they view.

Organizations with hefty data and sharing needs like the BBC and the European Commission, have recognized the urgency of having a rational data model. The European Commission is integrating a linked data approach as part of its ongoing interoperability initiative, which enables European governments to offer streamlined cross-border services.

Perhaps one of the most interesting ways structured data is being used is in the semantic web, which is building a linked data treasure trove that’s easier for machines to understand.

Web developers are now using the tools available at schema.org (an organization founded by Google, Microsoft, Yahoo and Yandex) to markup web content with a structure that makes it easier to link, share, and use information on the internet. These markup schemas make it possible to serve website content in a way that’s easily understood and integrated into linked databases like the Google Knowledge Graph. Adherence to semantic web standards is becoming an important part of Search Engine Optimization (SEO) strategies.

The Benefits of Linked Data

There are straightforward benefits that come from having structured and linked data, namely:

  • Makes it easier to get value from data, as it already contains meaning/context in the structure itself
  • Makes research easier and more productive using automated pattern discovery
  • Huge data sets can determine the distant outcomes of policies
  • Reduces the redundancy of research processes and assists data integration from multiple sources

How Do You Start with Linked Data?

To start using linked data, your organization must start using RDF whenever it adds data to a database. Many organizations already use XML or app-specific data sources, so these may need to be converted to RDF using a suitable app or data platform.

It’s much easier to start using RDF than to go back and convert old data. This process can be laborious and time-consuming to do retrospectively, but tools like the wvr.io platform have been developed for exactly this purpose.

Getting the Best from Linked Data: Pitfalls to Avoid

Costly Custom Projects

The full potential of linked data can only be realized when it’s linked to the maximum of data points and when multiple datasets are combined. It is natural to be dissuaded by a challenge of this complexity. Many linked data projects are non-profit or governmental institutions because they can take a lot of time and resources to set-up and maintain. Custom projects like these can seem a risky investment for a company focused on profit, ROI, and time-to-market as governing KPIs. Companies are often better using a ready-made and reusable data platform, as this can deliver faster results and massively cut the time and budget spent.

Data Quality Control

Perhaps the greatest challenge for linked data projects is that of data quality. Data combined from multiple sources is often inconsistent, inaccurate, out-of-date, or incomplete. Unless it is thoroughly cleaned and harmonized beforehand, this results in severe limitations in how the data can be utilized.

Common Entity Identifiers

A lack of common entity identifiers can also be a problem when multiple databases are combined, but these can be resolved by using quasi-identifiers (QIDs) or other tools.

Strategy and Vision

The most value can be gained from the largest datasets, however these also represent some of the most complex scenarios as well. Having a clear vision about the end-goals and how these will be reached can help guide the process from the very beginning, and avoid serious headaches later on. Advice from data experts can be a defining factor in the end-result of a data project, and help get things moving in the right direction much faster.

Conclusion: How Companies Can Make the Most from Their Linked Data

Data can only have value when it’s connected and readable by machines. Insights that can guide more profitable decisions and more efficient processes can only be made by algorithms that have access to good quality linked data.

When your data is structured with RDF or a similar data model, it’s harder to make the error of leaving it trapped in siloes where it has limited impact.

It is infinitely better for organizations to start the process of using RDF and linked data as early as possible. Using a data platform can save a lot of time and money and produce incredible end-results. With linked data, you’re creating an asset that has long-term value.

Want to learn more about how the wvr.io platform can accelerate your data-driven innovations? Need advice on how to turn your data into a valuable company asset? Get in touch.

Share this post