MeSH2Matrix: Machine learning-driven biomedical relation classification based on the MeSH keywords of PubMed scholarly publications

Abstract

Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in generalization, precision and reliability. However, the use of the distinctive characteristics of bibliographic metadata can prove effective in achieving a better performance for this challenging task. In this research paper, we introduce an approach for biomedical relation classification using the qualifiers of co-occurring Medical Subject Headings (MeSH). First of all, we introduce MeSH2Matrix, our dataset consisting of 46,469 biomedical relations curated from PubMed publications using our approach. Using MeSH2Matrix, we build and train three machine learning models (SVM, D-Model and C-Net) to evaluate the efficiency of our approach for biomedical relation classification. Our best model achieves an accuracy of 70.78% for 195 classes and 83.09% for five superclasses. Our results will hopefully shed light on developing better algorithms for biomedical ontology construction based on the MeSH keywords of PubMed publications. For reproducibility purposes, MeSH2Matrix as well as all our source codes are made publicly accessible at https://github.com/SisonkeBiotik-Africa/MeSH2Matrix.

Houcemeddine Turki
Houcemeddine Turki
Medical student

My research interests include the development of a large-scale framework for using open resources and semantic technologies for driving biomedical informatics and research evaluation at a low cost.

Mohamed Ali Hadj Taieb
Mohamed Ali Hadj Taieb
Assistant professor

My research interests include semantic similarity, semantic relatedness, knowledge representation, Big Data, social media, data management systems and graph embedding.

Mohamed Ben Aouicha
Mohamed Ben Aouicha
Associate professor

My research interests concern information retrieval, semantic technologies, social media analytics, knowledge representation, Big Data and graph embedding.