Dagstuhl seminar on Knowledge Graphs

Returning home from a very interesting Dagstuhl seminar on Knowledge Graphs, it is time to collect some thoughts. In the seminar we developed a shared understanding of the current state of the art in Knowledge Graphs and more importantly mapped out the road ahead. The format of the seminar consisted of 5-min pitches on relevant topics, and then followed up by group discussions, to be summarised and consolidated in an upcoming report. In the spirit of true societal (and research) progress a large part of the seminar was devoted to discussing grand challenges in our society, where in this case the focus was on those where we believe Knowledge Graphs can play a crucial role in addressing the challenges. In the upcoming report those will be discussed in depth, but examples of such challenges include interaction between humanity and machines, the kind of explainable and human-centred AI that is required in various societal domains, such as medicine, keeping up with knowledge evolution and rapidly changing information in our society, and addressing information interoperability at scale.

The feeling I in particular take with me from this seminar is that we have a unique opportunity to really facilitate interaction and integration of major results from different areas, and that Knowledge Graphs may be the key that finally makes this possible at scale.

However, taking a step back, one may first ask the question: What is a Knowledge Graph? And how does it relate to previous objects of study, such as Linked Data or Ontologies? Although this was discussed at length in the seminar, my personal viewpoint is that we do not really need a strict scientific definition. Potentially a descriptive one could be useful, but even just exemplifying what we mean when talking about Knowledge Graphs should be enough. To me a Knowledge Graph is about two things: knowledge that is represented in some graph-like format, preferably machine readable, and (can be) used as the source of knowledge/information/data in some application. This subsumes both ontologies, Linked Data, and all the various Knowledge Graphs proposed by large companies so far. Although Google were the ones to popularise the term a few years ago, it has been around also before that, and can even be traced back to ancient times (as some people pointed out in the seminar). However, that does not reduce the importance of the Google Knowledge Graph, both as a positive example and inspiration for others (i.e., Knowledge Graphs of “everything” can really work at scale), and as a popular explanation of the term, or could maybe even be seen as a revitalisation of the whole knowledge representation field.

So, how does it relate to existing fields then? Here we come back to my key take-away from the seminar – integration of research fields. I do not see Knowledge Graphs as a new field, nor as a renaming of some existing area, such as the Semantic Web or ontologies, but rather it is what emerges when you marry ontologies and Linked Data with property graphs and graph databases and the web. Or macine learning models with graph formats and methods for symbolic knowledge representation, e.g., to create explainable AI. Of course, that means that everything we learned so far in these individual fields is very valuable, e.g., ontology engineering, representation formats and standards etc., but it is when you marry that with results from other fields that 1+1 becomes 3, or even 10. So if you ask for the relation to ontologies, for instance, I would say that Knowlege Graphs is a generalisation, where any Semantic Web ontology can probably be considered to be a Knowledge Graph, but not every Knowledge Graph (probably just a few) will be an ontology.

 

Related to our own research in the Linköping University Semantic Web group, we do have some very valuable pieces of this puzzle to offer. In the knowledge representation area we have worked a lot on ontology engineering and ontology design patterns, and this is a valuable input also for creation of Knowledge Graphs. In particular the notion of design patterns I believe is very valuable also when creating generic Knowledge Graphs. Especially since patterns are not only intended as a technical development tool, but can also support understandability, interoperability, reuse, and act as a least common denominator when matching and integrating data and knowledge. Also recent work on ontology matching will be directly applicable to Knowledge Graph matching and integration, as well as the work on ontology evolution and stream reasoning and complex event processing, for managing highly dynamic data and knowledge. All of this is highly relevant when generalised from ontologies to general Knowledge Graphs, maybe even more relevant than for the specific case of ontologies.

Then of course a Knowledge Graph needs to be represented in some way, preferably using a machine readable format and in a language with some formal semantics. RDF is an obvious candidate for representing Knowledge Graphs on the web. However, so far the RDF community has been quite separated from the community around property graphs (and graph databases), in my opinion mainly due to the difficulties of directly representing property graphs in RDF. Also here the LiU group has something to offer, in the form of the proposals by Olaf Hartig on RDF and SPARQL extensions to bridge this gap (called RDF* and SPARQL*) as well as our research on graph data, and models for that, in general.

I hope this seminar will really become the starting point of something new. New research directions, and a more inclusive community (than maybe the Semantic Web community has been, in retrospect) around Knowledge Graps that embraces the need for integrating approaches from various other fields, embraces variety and complexity, and embraces dynamics.

LiU Semantic Web group at ESWC2017

This week a couple of us have been at ESWC2017 in Portoroz, Slovenia. Eva Blomqvist was the general chair of the conference this year, hence, this was the culmination of a whole year of hard work for her. Olaf Hartig is the proceedings chair (proceedings part 1 and 2). He could not attend the conference this year, but has done great job with the Springer proceedings, and the upcoming post-proceedings volume with poster and demo papers among other things. In addition to this, Karl Hammar, was one of the organisers of the Modular Ontology Modeling with Ontology Design Patterns tutorial, together with Pascal Hitzler, Adila A. Krisnadhi, Agnieszka Lawrynowicz and Monika Solanki. In particular, Karl ran the hands-on session with his tool for ODP-based modelling in WebProtégé (called XDP). Finally, Henrik Eriksson, presented our EU-funded project VALCRI in the project networking session, and in the poster session.

The overall conference was interesting as always, and included a lot of networking opportunities, as well as interesting work to take a closer look at. A quick summary of some of the major events:

Crosbie

Kevin Crosbie, from Ravenpack, the first keynote speaker talking about how to model events in order to use them for predicting financial markets. Very interesting talk, describing how Ravenpack work with their data products and apply technologies very similar to Semantic Web, although technically not using the W3C standards, such as RDF.

Panel

At the end of the first day, Aldo Gangemi chaired a panel about the future of academic publishing, discussing the challenges and opportunities that lie ahead. It is clear that something needs to be done about both the reviewing situation in our field, the open access issue, and we want more focus on “eating our own dog food”. The discussions were also related to the paper that later won the best student paper award, on Linked Data Notifications.

Sheridan

Second keynote speaker, John Sheridan, from the National Archives in the UK, described how the National Archives heavily rely on Semantic Web technologies and standards to solve their archiving tasks. However there are also challenges of course, which can hopefully be solved by working together: academia and society at large. Particularly interesting for us at LiU to hear that the National Archives is in great need of a better solution for modelling trust and uncertainty in their data, which could be a potential use case for the recent research results on RDF* and SPARQL* by Olaf Hartig.

Dinner

Nice conference dinner at the beach, and a chance for the general chair to thank all the people in the organising committee.

Poster

Poster session with lots of interesting interactions and discussion, here with Diego Reforgirato, who later won both the best poster and best demo awards.

Unfortunately, we did not take any picture of the last keynote, Lora Aroyo, who gave a very interesting keynote on the last day. She started with an overview of the evolution of the field, pointing out that studying and using people to acquire knowledge has always been a central part of our research. However, by over simplifying, and trying to fit every answer into yes/no categories, we can introduce wrong conclusions. She means that we need to be aware of ambiguity and diversity in opinions, that there is usually not one true answer, and instead turn that to our advantage. Lora showed a vector-based model to represent diversity in opinions.

Finally, Aldo Gangemi will be the next general chair of ESWC in 2018, and he made a series of interesting promises for the next year, among others: double-open review process, improvements in the online pre-prints of the proceedings and the dataset, a resources track and an industry session á la ISWC, and better music in the social events. We all wish him the best of luck with the next conference, and we are excited to see all the innovations next year!