Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19

Abstract

Semantic text annotations have been a key factor for supporting computer applications ranging from knowledge graph construction to biomedical question answering. In this systematic review, we provide an analysis of the data models that have been applied to semantic annotation projects for the scholarly publications available in the CORD-19 dataset, an open database of the full texts of scholarly publications about COVID-19. Based on Google Scholar and the screening of specific research venues, we retrieve seventeen publications on the topic mostly from the United States of America. Subsequently, we outline and explain the inline semantic annotation models currently applied on the full texts of biomedical scholarly publications. Then, we discuss the data models currently used with reference to semantic annotation projects on the CORD-19 dataset to provide interesting directions for the development of semantic annotation models and projects.

Houcemeddine Turki
Houcemeddine Turki
Medical student

My research interests include the development of a large-scale framework for using open resources and semantic technologies for driving biomedical informatics and research evaluation at a low cost.

Mohamed Ali Hadj Taieb
Mohamed Ali Hadj Taieb
Assistant professor

My research interests include semantic similarity, semantic relatedness, knowledge representation, Big Data, social media, data management systems and graph embedding.

Mohamed Ben Aouicha
Mohamed Ben Aouicha
Associate professor

My research interests concern information retrieval, semantic technologies, social media analytics, knowledge representation, Big Data and graph embedding.