Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata

Abstract

Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task set necessary to assess and validate the portion of Wikidata relating to the COVID-19 epidemiology. These tasks assess statistical data and are implemented in SPARQL, a query language for semantic databases. We demonstrate the efficiency of our methods for evaluating structured non-relational information on COVID-19 in Wikidata, and its applicability in collaborative ontologies and knowledge graphs more broadly. We show the advantages and limitations of our proposed approach by comparing it to the features of other methods for the validation of linked web data as revealed by previous research.

Houcemeddine Turki
Houcemeddine Turki
Medical student

My research interests include the development of a large-scale framework for using open resources and semantic technologies for driving biomedical informatics and research evaluation at a low cost.

Mohamed Ali Hadj Taieb
Mohamed Ali Hadj Taieb
Assistant professor

My research interests include semantic similarity, semantic relatedness, knowledge representation, Big Data, social media, data management systems and graph embedding.

Mohamed Ben Aouicha
Mohamed Ben Aouicha
Associate professor

My research interests concern information retrieval, semantic technologies, social media analytics, knowledge representation, Big Data and graph embedding.