Integrating SPARQL and LLMs for Question Answering over Scholarly Data Sources

Fondi, Fomubad Borista, Fidel, Azanzi Jiomekong, Camara, Gaoussou

arXiv.org Artificial Intelligence 

The Scholarly Hybrid Question Answering over Linked Data (QALD) Challenge at the International Semantic Web Conference (ISWC) 2024 focuses on Question Answering (QA) over diverse scholarly sources: DBLP, SemOpenAlex, and Wikipedia-based texts. This paper describes a methodology that combines SPARQL queries, divide and conquer algorithms, and a pre-trained extractive question answering model. It starts with SPARQL queries to gather data, then applies divide and conquer to manage various question types and sources, and uses the model to handle personal author questions. The approach, evaluated with Exact Match and F-score metrics, shows promise for improving QA accuracy and efficiency in scholarly contexts. Keywords: Scholarly Question Answering, Large Language Models, Divide and conquer.