Graphs with complex data structures, featuring various entity types and relationships, can hold a wealth of information—some intentionally modelled, others hidden and undiscovered. To illustrate, consider a city logistics scenario where ANPR cameras are strategically positioned to monitor vehicle movements in and out of specific zones. These cameras capture key data, such as the timestamp and license plate, uniquely identifying each vehicle. Additional details, like fuel type and emission class, can be linked to the registered vehicles, while company-owned vehicles bring in more layers of insight by adding information about their parent companies. This enriched dataset enables the analysis of traffic patterns by vehicle type, emission levels, or fuel usage, and even reveals which industries frequent the area most and when. With such granular insights, informed decisions can be made to design low-emission zones or even transition to entirely car-free areas.
So, how does that translate to a graph based digital twin representation?
In Figure 1, a snippet of the graph designed for the City Logistics use-case is illustrated. This includes sensor nodes that serve as digital twins of physical ANPR cameras monitoring vehicle movements. Each sensor node has a madeObservation relationship connecting it to observation nodes, which capture details of specific observations, such as observation time. These observation nodes are further linked via observedVehicle relationships to the observed vehicle’s digital twin. Each vehicle node is enriched with attributes like fuel type and emission class. The real graph also features company nodes (not depicted in Figure), representing vehicle-owning companies, each characterised by details such as the sectors they operate in and their principle place of business. The hasCompanyOwner relationship links vehicle nodes to their respective company nodes.
Exploring such a complex graph can be daunting. You need to master the query language to interact with the database and understand the graph’s logic. It feels far more intuitive for most people if the graph could be queried in one’s native language. This makes the graph accessible to a much broader audience, including those without technical expertise.
This is where Large Language Models (LLMs), or AI Agents, come into play, making it possible to query information naturally.
In the previous section, we explored how AI Agents can access external sources of information. One such source could be a graph database containing information that was not used for training. This could be due to several reasons:
As previously explained, the AI agent, aware of the various tools at its disposal, informs the LLM (knowledgeable assistant) about these tools, their tasks, and their interfaces. For a graph database, query access is the most flexible access method, allowing full access to the graph content using a query language like SPARQL. SPARQL, a standard for over 16 years, is well-documented and widely discussed, making it familiar to the LLM.
The LLM can construct correct SPARQL queries based on a request. However, it needs to know what content is stored in the graph and how it is structured. This is where ontologies come in. Ontologies provide semantic information about the content and its structure. Therefore, along with a description of the database, the ontology is provided as input to the LLM, enabling it to understand the information stored in the graph and its organisation.
Let's consider the following request based on the previously sketched City Logistics example:
Provide the number of times vehicles of each fuel type are observed per day of the week. Display the results in a line chart. Put the days of the week along the x-axis and the number of observations along the y-axis. Use the English labels where you can.
While I don't claim to fully understand how an LLM provides its answers (since I use reasoning, which LLMs don’t), let’s try to roughly explain what happens within the AI Agent and its LLM companion:
This is an oversimplified and likely not entirely accurate representation of the real process. Also, in general, the process is iterative, where subsequent queries (hopefully) converge to a satisfactory answer. Proper error handling is also crucial but not discussed here.
To conclude and demonstrate that an AI Agent indeed can handle such queries, below is the generated result by our Royal HaskoningDHV Automate AI Agent. Through the use of LLM’s we are able to understand our data more clearly, and create more valuable digital twins. Get in touch to discover more about the work we are doing and how it can support your digital twin aspirations!
Contact our experts!