From data to wisdom

Digital Digital twin Embracing digital

The journey of Ordinary Graphs to Knowledge Graphs - A deeper look into the essence of Knowledge Graphs and their distinctions from conventional 'ordinary' data graphs.

This blog serves as a continuation of the previous discussion in “Digital Twins and Knowledge Graphs: A Match Made in Data Heaven” where we introduced the basic concepts of graphs and knowledge graphs, demonstrating their compatibility with Digital Twins. As we prepare for a forthcoming blog, which will examine the interplay between Knowledge Graphs and Large Language Models, it is imperative to delve deeper into the essence of Knowledge Graphs and clarify their distinctions from conventional data graphs.

What makes a graph a knowledge graph?

So, already touched upon in the previous blog, many people use the term knowledge graph when they are actually referring to a data graph. True, graphs are a very intuitive way to represent data and are usually quite easy to interpret when shown to a fellow colleague. But that does not mean that a computer is equally able to do so. Semantics are usually added by the programmer that interprets the used (node and relationship) naming of the data, instead of applying some formal semantics. Knowledge of the language used, and its meaning in a certain context, is subconsciously applied.

This blog is meant to show what separates knowledge from data, and to demonstrate what is needed to upgrade a data graph to a knowledge graph.

From data to wisdom

So, let’s start by explaining the difference between data and knowledge. For that, let’s go back to the definition made by Russell Ackoff in 1989, often displayed in a pyramid shape going from data (pyramid base) to wisdom (pyramid top):

Data: Raw, unprocessed facts without context. Think of, numbers, dates, and strings of text.

example: “44d2a6409a719867dc35e62962fa9a346484311a”

Information: Data that has been organised, structured, and given context. It answers basic questions like “who,” “what,” “where” and “when.”

example: “44d2a6409a719867dc35e62962fa9a346484311” represents a (hashed) license plate that has been observed by a camera in the city centre. This license plate can be connected to its vehicle that has information on its fuel type and emission class.

Knowledge: Information that has been analysed and interpreted to uncover patterns, trends, and relationships. It provides an understanding of “how” and “why” certain phenomena occur.

example: With cameras surrounding the city centre, we now know how many vehicles of each emission class are entering a city centre during various time periods of the day and week.

Wisdom: The ability to make sound judgments and decisions based on knowledge. It involves applying knowledge in practical, insightful ways to solve problems and make decisions. Wisdom answers the question “what should be done” and often incorporates ethical and moral considerations.

example: We can now create a low-emission zone in the city centre to reduce emissions, knowing the effect it has on other aspects such as on companies that frequently need to be in the city centre.

So, to go from data to knowledge (and eventually to wisdom), we obviously need a way to add extra information or semantics to the “data graph”. And not surprisingly, a quite intuitive way to provide this semantic information is through a graph. The semantic graph contains “special” nodes and “special” relationships that provide information on the nodes and relationships in the “data graph” they are associated with.

Just for the sake of simplicity, you could call that graph the “semantic graph” or “ontology graph”. When combined with the data graph they form a single graph that together can be considered a Knowledge Graph. Obviously, we need a framework that allows us to specify this data and an ontology graph. This brings us to RDF, or Resource Description Framework.

Adding semantics using ontologies

RDF is a framework for representing information about resources on the web. It uses a simple structure of triples (subject, predicate, object) to describe relationships between data. These triples are also referred to as statements, where the subject is a resource, the predicate (or property) defines the relationship with the subject which in turn can be another resource or a literal value (Figure 1 shows a graph that can be constructed through several statements that relate two resources and a statement that relates a resource to a literal). Where property graphs can be built by adding nodes (with their properties) at first and relating them afterwards, an RDF graph is constructed by providing statements about the objects and data.

Figure 1 illustrates which statements are required to build the graph depicted on the left. One difference between an RDF graph and a property graph is that properties of a resource (node) are specified through statements as well (the name of Bob) and are not an intricate part of a resource or node. Another essential difference between RDF graphs and property graphs is the fact that all resources (not literals) are defined by a Uniform Resource Identifier (URI) which makes them globally unique. And especially this requirement opens the way towards a formal ontology language.

example of a graph — *Figure 1*: How an RDF graph is constructed from statements. Statements are object-predicate-subject triples where the object and subject are always URI's. The subject can either be a literal or a resource. Resources are shown as ovals, while literals are denoted by a rectangular shape.

Identical to programming languages that have reserved words like IF, THEN, ELSE, ontology languages have a set of reserved terms, or vocabulary, as well. These reserved terms are predefined and have specific meanings within the language. For example, in OWL (i.e. the Web Ontology Language), terms like Class, Property, and Individual are part of this reserved vocabulary. URI’s enable the definition of a unique set of terms. For example, the full name or URI for the OWL Class is http://www.w3.org/2002/07/owl#Class. Hence, the statement:

http://mydomain.xxx/ontology/Vehicle
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2002/07/owl#Class

indicates that the Vehicle resource is of type (or is a) Class. Using prefixes (abbreviating the non italicised text in the above statemen), you can use a shorter notation:

vehont:Vehicle rdf:type owl:Class

Because rdf:type and owl:Class have been defined in W3C standards, the statement can now unambiguously be interpreted by humans and computers as

The resource vehont:Vehicle - is a (is of type) - Class

And the following statement

vehinst:MyDadsLada rdf:type vehont:Vehicle

The resource vehinst:MyDadsLada - is a (member of the class) - Vehicle.

Where MyDadsLada can be considered as data (from the data graph) that is enriched with semantic information that this object is part of a collection of similar entities called Vehicle(s). By further specifying or constraining the Vehicle (Class) you essentially constrain the members of that class. This example demonstrates how resources (entities/assets) can be structured. In a similar way it is possible to describe relationships, thereby constraining their use. For the hasFriend relationship shown in Figure 1 you can define that that it should be between two instances of the Class People, while the name relationship should have an instance of People at the source and a string on the destination side of the relationship.

Extensions to RDF, such as OWL, have been created and standardised allowing RDF graphs to be enriched with semantic information on the associated data. To illustrate the power of ontologies, here’s a few things they can bring:

Ontologies provide a formal way to define the structure of data, including the types of entities, their properties, and the relationships between them. This is like a database schema but with much richer semantics.
Ontologies help integrate data from different sources by providing a common framework and vocabulary. This is particularly useful in scenarios where data needs to be combined from various domains or systems.
Ontologies are used to represent complex knowledge in a structured and machine-readable format. This is essential for applications in artificial intelligence, natural language processing, and semantic web technologies.
Ontologies facilitate interoperability between different systems and applications by providing a standardised way to represent and exchange data.
Ontologies enable reasoning engines to infer new information from existing data by defining rules and relationships. This can enhance data analysis and decision-making processes.

Notice that although graphs (or statements) are used to encode the semantic information, interpretation of this information is still required. For example, inferencing new relationships should be performed by an engine that understands the logic described by the ontology. With every adaptation of the graph, all defined constraints, and all inferencing rules need to be evaluated. This means that RDF databases (also called triple stores) should natively support OWL (and other RDF extensions) to truly support ontologies. This is why it is so difficult to provide full RDF/OWL-based ontology support in non-RDF (property) databases.

In summary, while property graphs allow for flexible node and relationship creation, ontologies provide a formal, semantically rich framework that supports reasoning, inference, and interoperability. This makes ontologies more than just an extension of a graph, they are a powerful tool for representing and working with complex knowledge structures.

To read the 3rd blog in this series, From Complexity to Clarity, click here.

Do you want to know moreor have a question?

Contact our experts!