exact answer
Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach
Panou, Dimitra, Dimopoulos, Alexandros C., Koubarakis, Manolis, Reczko, Martin
Biomedical text mining and question-answering are essential yet highly demanding tasks, particularly in the face of the exponential growth of biomedical literature. In this work, we present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering for Task 13b and biomedical question-answering for developing topics for the Synergy task. We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions. Various models are used to process the questions. A majority voting system combines their output to determine the final answer for Yes/No questions, while for list and factoid type questions, the union of their answers in used. We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer, resulting in tailored LLM pipelines for each question type. Our findings provide valuable insight into which combinations of LLMs consistently produce superior results for specific question types. In the four rounds of the 2025 BioASQ challenge, our system achieved notable results: in the Synergy task, we secured 1st place for ideal answers and 2nd place for exact answers in round 2, as well as two shared 1st places for exact answers in round 3 and 4.
NLP and Education: using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom
de Gois, Túlio Sousa, Freitas, Flávia Oliveira, Tejada, Julian, Freitag, Raquel Meister Ko.
Since half past the last century, the Cloze test has been used for educational purposes to assess proficiency in understanding texts in different languages Taylor [1953], Brown [1980, 2002]. The task consists of the systematic filling in of gaps in a text, specifically a prose selection Bickley et al. [1970], previously adapted to the participant's realities, and the scores of correct answers are associated with the degree of comprehension of the text by the participant. Different measures, such as exact answer, acceptable answer Brown [1980], multiple choice, and Clozentropy Darnell [1968], Lowry and Marr [1975], have been used to assess gap-filling since Taylor's initial proposal Taylor [1953]. These measures will be further examined in Section 2. The exact answer may seem easier to calculate, especially for a Cloze test applied to large and heterogeneous groups of students with insufficient time for teachers to analyze each answer individually. In Brazil, for instance, teachers usually have to manage numerous classes, and this correction method helps to provide rapid answers to students' reading proficiency, allowing one to check the answers objectively Cunha and Santos [2010] without possible or different options.
Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions
Our team participated in the BioASQ 2024 Task12b and Synergy tasks to build a system that can answer biomedical questions by retrieving relevant articles and snippets from the PubMed database and generating exact and ideal answers. We propose a two-level information retrieval and question-answering system based on pre-trained large language models (LLM), focused on LLM prompt engineering and response post-processing. We construct prompts with in-context few-shot examples and utilize post-processing techniques like resampling and malformed response detection. We compare the performance of various pre-trained LLM models on this challenge, including Mixtral, OpenAI GPT and Llama2. Our best-performing system achieved 0.14 MAP score on document retrieval, 0.05 MAP score on snippet retrieval, 0.96 F1 score for yes/no questions, 0.38 MRR score for factoid questions and 0.50 F1 score for list questions in Task 12b.
Are there more wheels or doors in the world? ChatGPT wades into viral debate that's been dividing the internet... its answer may surprise you
Viral phenomena have been around for almost as long as the internet has. You might remember the dress that took Tumblr by storm back in 2015 – was it blue and black or white and gold? But using ChatGPT, MailOnline tries to settle the debate, which has seen Twitter users go to great lengths to prove whether there are more doors or wheels in the world. MailOnline spoke to ChatGPT – but the answer may surprise you. The bot produced an autogenerated response, admitting defeat in its first sentence: 'It's difficult to provide an exact answer to this question, as it depends on a variety of factors and can change over time' Even OpenAI's proudest invention couldn't directly solve the query that has taken the internet by storm – and puzzled Twitter since last year. The bot produced an autogenerated response, admitting defeat in its first sentence: 'It's difficult to provide an exact answer to this question, as it depends on a variety of factors and can change over time'.
New Algorithms for Efficient High Dimensional Non-parametric Classification
This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational lee- way, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the pre- diction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.
Overview of BioASQ 2022: The tenth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Nentidis, Anastasios, Katsimpras, Georgios, Vandorou, Eirini, Krithara, Anastasia, Miranda-Escalada, Antonio, Gasco, Luis, Krallinger, Martin, Paliouras, Georgios
This paper presents an overview of the tenth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2022. BioASQ is an ongoing series of challenges that promotes advances in the domain of large-scale biomedical semantic indexing and question answering. In this edition, the challenge was composed of the three established tasks a, b, and Synergy, and a new task named DisTEMIST for automatic semantic annotation and grounding of diseases from clinical content in Spanish, a key concept for semantic indexing and search engines of literature and clinical records. This year, BioASQ received more than 170 distinct systems from 38 teams in total for the four different tasks of the challenge. As in previous years, the majority of the competing systems outperformed the strong baselines, indicating the continuous advancement of the state-of-the-art in this domain.