newsqa
Small Encoders Can Rival Large Decoders in Detecting Groundedness
Abbes, Istabrak, Prato, Gabriele, Fournier, Quentin, Rodriguez, Fernando, Boukhary, Alaa, Elwood, Adam, Chandar, Sarath
Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
Efficient In-Domain Question Answering for Resource-Constrained Environments
Chung, Isaac, Vo, Phat, Kizilkale, Arman C., Reite, Aaron
Retrieval Augmented Generation (RAG) is a common method for integrating external knowledge into pretrained Large Language Models (LLMs) to enhance accuracy and relevancy in question answering (QA) tasks. However, prompt engineering and resource efficiency remain significant bottlenecks in developing optimal and robust RAG solutions for real-world QA applications. Recent studies have shown success in using fine tuning to address these problems; in particular, Retrieval Augmented Fine Tuning (RAFT) applied to smaller 7B models has demonstrated superior performance compared to RAG setups with much larger models such as GPT-3.5. The combination of RAFT with parameter-efficient fine tuning (PEFT) techniques, such as Low-Rank Adaptation (LoRA), promises an even more efficient solution, yet remains an unexplored area. In this work, we combine RAFT with LoRA to reduce fine tuning and storage requirements and gain faster inference times while maintaining comparable RAG performance. This results in a more compute-efficient RAFT, or CRAFT, which is particularly useful for knowledge-intensive QA tasks in resource-constrained environments where internet access may be restricted and hardware resources limited.
The Turking Test: Can Language Models Understand Instructions?
Supervised machine learning provides the learner with a set of input-output examples of the target task. Humans, however, can also learn to perform new tasks from instructions in natural language. Can machines learn to understand instructions as well? We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity. These range from simple tasks, like retrieving the nth word of a sentence, to ones that require creativity, such as generating examples for SNLI and SQuAD in place of human intelligence workers ("turkers"). Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks. Analyzing the model's error patterns reveals that the model tends to ignore explicit instructions and often generates outputs that cannot be construed as an attempt to solve the task. While it is not yet clear whether instruction understanding can be captured by traditional language models, the sheer expressivity of instruction understanding makes it an appealing alternative to the rising few-shot inference paradigm.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > Alaska (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension
Maharana, Adyasha, Bansal, Mohit
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation. Training with adversarially augmented dataset improves robustness against those adversarial attacks but hurts generalization of the models. In this work, we present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation, but also improving generalization to the source domain as well as new domains and languages. We first propose three new methods for generating QA adversaries, that introduce multiple points of confusion within the context, show dependence on insertion location of the distractor, and reveal the compounding effect of mixing adversarial strategies with syntactic and semantic paraphrasing methods. Next, we find that augmenting the training datasets with uniformly sampled adversaries improves robustness to the adversarial attacks but leads to decline in performance on the original unaugmented dataset. We address this issue via RL and more efficient Bayesian policy search methods for automatically learning the best augmentation policy combinations of the transformation probability for each adversary in a large search space. Using these learned policies, we show that adversarial training can lead to significant improvements in in-domain, out-of-domain, and cross-lingual (German, Russian, Turkish) generalization without any use of training data from the target domain or language.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (14 more...)
- Education > Assessment & Standards > Student Performance (0.82)
- Information Technology > Security & Privacy (0.69)
- Government > Military (0.55)
Undersensitivity in Neural Reading Comprehension
Welbl, Johannes, Minervini, Pasquale, Bartolo, Max, Stenetorp, Pontus, Riedel, Sebastian
Current reading comprehension models generalise well to in-distribution test sets, yet perform poorly on adversarially selected inputs. Most prior work on adversarial inputs studies oversensitivity: semantically invariant text perturbations that cause a model's prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity, where input text is meaningfully changed but the model's prediction does not, even though it should. We formulate a noisy adversarial attack which searches among semantic variations of the question for which a model erroneously predicts the same answer, and with even higher probability. Despite comprising unanswerable questions, both SQuAD2.0 and NewsQA models are vulnerable to this attack. This indicates that although accurate, models tend to rely on spurious patterns and do not fully consider the information specified in a question. We experiment with data augmentation and adversarial training as defences, and find that both substantially decrease vulnerability to attacks on held out data, as well as held out attack spaces. Addressing undersensitivity also improves results on AddSent and AddOneSent, and models furthermore generalise better when facing train/evaluation distribution mismatch: they are less prone to overly rely on predictive cues present only in the training set, and outperform a conventional model by as much as 10.9% F1.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (8 more...)
- Education > Assessment & Standards > Student Performance (0.61)
- Government > Military (0.49)