qa-pair
Objective quantification of mood states using large language models
Emotional states influence human behaviour and cognition, leading to diverse thought trajectories. Similarly, Large Language Models (LLMs) showcase an excellent level of response consistency across wide-ranging contexts (prompts). We leverage these parallels to establish a framework for quantifying mental states. Our approach utilises self-report questionnaires that reliably assess these states due to their inherent sensitivity to patterns of co-occurring responses. Specifically, we recruited a large sample of participants (N=422) to investigate how well an LLM (Mistral-7B-OpenOrca) quantifies a heterogenous set of depressive mood states measured with participants' open-ended responses to a depression questionnaire. We show LLM responses to held-out multiple-choice questions, given participants' open-ended answers, correlate strongly (r: 0.52-0.84) with true questionnaire scores, demonstrating LLM's generalisation from mood representations. We explore a link between these representations and factor analysis. Using ridge regression, we find depression-related subspaces within LLM hidden states. We show these subspaces to be predictive of participants' "Depression" and "Somatic & Emotional Distress" factor scores, as well as suicidality severity. Overall, LLMs can provide quantitative measures of mental states. The reliability of these hinges upon how informative the questions we ask participants are. Used correctly, this approach could supplement mental state assessment in a variety of settings.
FairytaleCQA: Integrating a Commonsense Knowledge Graph into Children's Storybook Narratives
Chen, Jiaju, Lu, Yuxuan, Zhang, Shao, Yao, Bingsheng, Dong, Yuanzhe, Xu, Ying, Li, Yunyao, Wang, Qianwen, Wang, Dakuo, Sun, Yuling
AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowledge). We introduce the FairytaleCQA dataset, which is annotated by children education experts, to supplement 278 storybook narratives with educationally appropriate commonsense knowledge. The dataset has 5,868 QA pairs that not only originate from the storybook narrative but also contain the commonsense knowledge grounded by an external knowledge graph (i.e., ConceptNet). A follow-up experiment shows that a smaller model (T5-large) fine-tuned with FairytaleCQA reliably outperforms much larger prompt-engineered LLM (e.g., GPT-4) in this new QA-pair generation task (QAG). This result suggests that: 1) our dataset brings novel challenges to existing LLMs, and 2) human experts' data annotation are still critical as they have much nuanced knowledge that LLMs do not know in the children educational domain.
When to Read Documents or QA History: On Unified and Selective Open-domain QA
Lee, Kyungjae, Han, Sang-eun, Hwang, Seung-won, Lee, Moontae
Figure 1 illustrates the distinction of Open-domain question answering is a well-known our approach providing both knowledge to a unified task in natural language processing, aiming to answer reader as context. We retrieve a list of relevant factoid questions from an open set of domains. QA-pairs (called as QA-history), then treat the One commonly used approach for this task is the few retrieved QA examples, as if it is a relevant retrieve-then-read pipeline (also known as Openbook document passage. QA) to retrieve relevant knowledge, then reason Meanwhile, the closest approach to use multiple answers over the knowledge. Given the wide knowledge sources is concatenating the multisources range of topics that open-domain questions can uniformly into a single decoder (Oguz cover, a key to a successful answering model is: et al., 2020), but we argue knowledge selection is to access and utilize diverse knowledge sources critically missing. To motivate, Figure 1 shows the effectively. QA-history, from which answer'Eric Liddell' is Toward this goal, existing work can be categorized explicitly identified, while it is more implicit in the by the knowledge source used: document such that another name such as'Hugh Hudson' is known to often confuse QA models. It Document Corpus-based QA (Doc-QA): This is critical for the QA model to calibrate prediction type of work utilizes a general-domain Document quality as an indicator to decide when to use a Corpus (e.g., Wikipedia) (Karpukhin
Quinductor: a multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies
Kalpakchi, Dmytro, Boye, Johan
We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our method provides a strong, mostly deterministic, and inexpensive-totrain baseline for less-resourced languages. While a language-specific corpus is still required, its size is nowhere near those required by modern neural question generation (QG) architectures. Our method surpasses QG baselines previously reported in the literature and shows a good performance in terms of human evaluation. 1 Introduction We are interested in question generation (QG) - the task of automatically generating reading comprehension questions and their correct answers from given declarative sentences. Numerous methods have been proposed for solving this task, most of which have been aimed at the English language. Recent methods are based on neural networks and rely on the availability of large-scale datasets, such as SQuAD (Rajpurkar et al. 2016) - a question-answering dataset repurposed for QG - or large-scale pretrained models, such as GPT-3 (Brown et al. 2020). Early methods, mostly based on context-free grammars, relied on the strict word order and the limited inflectional morphology of English. These traits made it relatively straightforward to craft handwritten templates based on these grammars. The above mentioned idiosyncracies and the unique availability of large-scale resources for English leave a number of open challenges for developing QG methods applicable to languages other than English. The first challenge is the lack of large-scale training datasets, and a prohibitively high cost of obtaining such resources. State-of-the-art QG methods for English train their models on the previously mentioned SQuAD dataset, which contains more than 100,000 questions. Obtaining a good-quality dataset of a similar size is very expensive, especially for languages with fewer native speakers around the world. The second challenge is knowing how well available methods developed for English would generalize to other languages, especially synthetic ones with richer inflectional morphology and less strict word order (e.g., Finnish, Turkish or Russian). To the best of our knowledge, not much research has been done on QG for these kinds of languages. The third challenge is assessing the obtained performance results.
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Chen, Wenhu, Verga, Pat, de Jong, Michiel, Wieting, John, Cohen, William
Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at the cost of interpretability, controllability, and efficiency. The opposite properties arise in other methods which have instead relied on knowledge base (KB) facts. At the same time, more recent work has demonstrated the effectiveness of storing and retrieving from an index of Q-A pairs derived from text \citep{lewis2021paq}. This approach yields a high coverage knowledge representation that maintains KB-like properties due to its representations being more atomic units of information. In this work we push this line of research further by proposing a question-answer augmented encoder-decoder model and accompanying pretraining strategy. This yields an end-to-end system that not only outperforms prior QA retrieval methods on single-hop QA tasks but also enables compositional reasoning, as demonstrated by strong performance on two multi-hop QA datasets. Together, these methods improve the ability to interpret and control the model while narrowing the performance gap with passage retrieval systems.
Automatically generating question-answer pairs for assessing basic reading comprehension in Swedish
Kalpakchi, Dmytro, Boye, Johan
This paper presents an evaluation of the quality of automatically generated reading comprehension questions from Swedish text, using the Quinductor method. This method is a light-weight, data-driven but non-neural method for automatic question generation (QG). The evaluation shows that Quinductor is a viable QG method that can provide a strong baseline for neural-network-based QG methods.
It is AI's Turn to Ask Human a Question: Question and Answer Pair Generation for Children Storybooks in FairytaleQA Dataset
Yao, Bingsheng, Wang, Dakuo, Wu, Tongshuang, Hoang, Tran, Sun, Branda, Li, Toby Jia-Jun, Yu, Mo, Xu, Ying
Existing question answering (QA) datasets are created mainly for the application of having AI to be able to answer questions asked by humans. But in educational applications, teachers and parents sometimes may not know what questions they should ask a child that can maximize their language learning results. With a newly released book QA dataset (FairytaleQA), which educational experts labeled on 46 fairytale storybooks for early childhood readers, we developed an automated QA generation model architecture for this novel application. Our model (1) extracts candidate answers from a given storybook passage through carefully designed heuristics based on a pedagogical framework; (2) generates appropriate questions corresponding to each extracted answer using a language model; and, (3) uses another QA model to rank top QA-pairs. Automatic and human evaluations show that our model outperforms baselines. We also demonstrate that our method can help with the scarcity issue of the children's book QA dataset via data augmentation on 200 unlabeled storybooks.
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
Lewis, Patrick, Wu, Yuxiang, Liu, Linqing, Minervini, Pasquale, Küttler, Heinrich, Piktus, Aleksandra, Stenetorp, Pontus, Riedel, Sebastian
Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5%, but trail RePAQ by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be configured for size (under 500MB) or speed (over 1K questions per second) whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to ``back-off" to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.