AITopics | Stearns County

Collaborating Authors

Stearns County

The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs

Guan, Bryan, Roosta, Tanya, Passban, Peyman, Rezagholizadeh, Mehdi

arXiv.org Artificial IntelligenceFeb-6-2025

As large language models (LLMs) become integral to diverse applications, ensuring their reliability under varying input conditions is crucial. One key issue affecting this reliability is order sensitivity, wherein slight variations in input arrangement can lead to inconsistent or biased outputs. Although recent advances have reduced this sensitivity, the problem remains unresolved. This paper investigates the extent of order sensitivity in closed-source LLMs by conducting experiments across multiple tasks, including paraphrasing, relevance judgment, and multiple-choice questions. Our results show that input order significantly affects performance across tasks, with shuffled inputs leading to measurable declines in output accuracy. Few-shot prompting demonstrates mixed effectiveness and offers partial mitigation, however, fails to fully resolve the problem. These findings highlight persistent risks, particularly in high-stakes applications, and point to the need for more robust LLMs or improved input-handling techniques in future development. In recent years, large language models (LLMs) have become essential across various applications, helping users complete tasks in diverse domains, thanks to their remarkable abilities in understanding, analyzing, and generating text (Shen et al., 2023a; Yu et al., 2023). However, LLMs are not without their problems and risks. Many of these issues, such as bias (Talat et al., 2022; Motoki et al., 2023), hallucination (Chen et al., 2023; Sadat et al., 2023), consistency (Tam et al., 2023; Ye et al., 2023), and reliability (Shen et al., 2023b) have been extensively discussed in the literature. However, a more fundamental challenge to the long-term success of LLMs is their ability to reason: the distinguishing factor between probabilistic pattern matching and logical understanding. This distinction has significant implications for the future of LLMs and how we employ these models in decision-making. One necessary requirement for reasoning is order independence.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.04134

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Minnesota > Stearns County (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Answer Generation using Supervision from Automatic Question Answering Evaluators

Gabburo, Matteo, Garg, Siddhant, Koncel-Kedziorski, Rik, Moschitti, Alessandro

arXiv.org Artificial IntelligenceMay-24-2023

Recent studies show that sentence-level extractive QA, i.e., based on Answer Sentence Selection (AS2), is outperformed by Generation-based QA (GenQA) models, which generate answers using the top-k answer sentences ranked by AS2 models (a la retrieval-augmented generation style). In this paper, we propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA). Specifically, we propose three strategies to transfer knowledge from these QA evaluation models to a GenQA model: (i) augmenting training data with answers generated by the GenQA model and labelled by GAVA (either statically, before training, or (ii) dynamically, at every training epoch); and (iii) using the GAVA score for weighting the generator loss during the learning of the GenQA model. We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.

genqa model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2305.15344

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback