Ning, Qiang
Self-supervised Analogical Learning using Language Models
Zhou, Ben, Jain, Sarthak, Zhang, Yi, Ning, Qiang, Wang, Shuai, Benajiba, Yassine, Roth, Dan
Large language models have been shown to suffer from reasoning inconsistency issues. That is, they fail more in situations unfamiliar to the training data, even though exact or very similar reasoning paths exist in more common cases that they can successfully solve. Such observations motivate us to propose methods that encourage models to understand the high-level and abstract reasoning processes during training instead of only the final answer. This way, models can transfer the exact solution to similar cases, regardless of their relevance to the pre-training data distribution. In this work, we propose SAL, a self-supervised analogical learning framework. SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions from cases that they know how to solve to other rare cases in which they tend to fail more. We show that the resulting models after SAL learning outperform base language models on a wide range of reasoning benchmarks, such as StrategyQA, GSM8K, and HotpotQA, by 2% to 20%. At the same time, we show that our model is more generalizable and controllable through analytical studies.
Open Domain Question Answering with Conflicting Contexts
Liu, Siyi, Ning, Qiang, Halder, Kishaloy, Xiao, Wei, Qi, Zheng, Htut, Phu Mon, Zhang, Yi, John, Neha Anna, Min, Bonan, Benajiba, Yassine, Roth, Dan
Open domain question answering systems frequently rely on information retrieved from large collections of text (such as the Web) to answer questions. However, such collections of text often contain conflicting information, and indiscriminately depending on this information may result in untruthful and inaccurate answers. To understand the gravity of this problem, we collect a human-annotated dataset, Question Answering with Conflicting Contexts (QACC), and find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search. We evaluate and benchmark three powerful Large Language Models (LLMs) with our dataset QACC and demonstrate their limitations in effectively addressing questions with conflicting information. To explore how humans reason through conflicting contexts, we request our annotators to provide explanations for their selections of correct answers. We demonstrate that by finetuning LLMs to explain their answers, we can introduce richer information into their training that guide them through the process of reasoning with conflicting contexts.
From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification
Wang, Fei, Shang, Chao, Jain, Sarthak, Wang, Shuai, Ning, Qiang, Min, Bonan, Castelli, Vittorio, Benajiba, Yassine, Roth, Dan
User alignment is crucial for adapting general-purpose language models (LMs) to downstream tasks, but human annotations are often not available for all types of instructions, especially those with customized constraints. We observe that user instructions typically contain constraints. While assessing response quality in terms of the whole instruction is often costly, efficiently evaluating the satisfaction rate of constraints is feasible. We investigate common constraints in NLP tasks, categorize them into three classes based on the types of their arguments, and propose a unified framework, ACT (Aligning to ConsTraints), to automatically produce supervision signals for user alignment with constraints. Specifically, ACT uses constraint verifiers, which are typically easy to implement in practice, to compute constraint satisfaction rate (CSR) of each response. It samples multiple responses for each prompt and collect preference labels based on their CSR automatically. Subsequently, ACT adapts the LM to the target task through a ranking-based learning process. Experiments on fine-grained entity typing, abstractive summarization, and temporal question answering show that ACT is able to enhance LMs' capability to adhere to different classes of constraints, thereby improving task performance. Further experiments show that the constraint-following capabilities are transferable.
PInKS: Preconditioned Commonsense Inference with Minimal Supervision
Qasemi, Ehsan, Khanna, Piyush, Ning, Qiang, Chen, Muhao
Reasoning with preconditions such as "glass can be used for drinking water unless the glass is shattered" remains an open problem for language models. The main challenge lies in the scarcity of preconditions data and the model's lack of support for such reasoning. We present PInKS, Preconditioned Commonsense Inference with WeaK Supervision, an improved model for reasoning with preconditions through minimum supervision. We show, both empirically and theoretically, that PInKS improves the results on benchmarks focused on reasoning with the preconditions of commonsense knowledge (up to 40% Macro-F1 scores). We further investigate PInKS through PAC-Bayesian informativeness analysis, precision measures, and ablation study.
Extracting Temporal Event Relation with Syntactic-Guided Temporal Graph Transformer
Zhang, Shuaicheng, Huang, Lifu, Ning, Qiang
Extracting temporal relations (e.g., before, after, concurrent) among events is crucial to natural language understanding. Previous studies mainly rely on neural networks to learn effective features or manual-crafted linguistic features for temporal relation extraction, which usually fail when the context between two events is complex or wide. Inspired by the examination of available temporal relation annotations and human-like cognitive procedures, we propose a new Temporal Graph Transformer network to (1) explicitly find the connection between two events from a syntactic graph constructed from one or two continuous sentences, and (2) automatically locate the most indicative temporal cues from the path of the two event mentions as well as their surrounding concepts in the syntactic graph with a new temporal-oriented attention mechanism. Experiments on MATRES and TB-Dense datasets show that our approach significantly outperforms previous state-of-the-art methods on both end-to-end temporal relation extraction and temporal relation classification.
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
Mirzaee, Roshanak, Faghihi, Hossein Rajaby, Ning, Qiang, Kordjmashidi, Parisa
This paper proposes a question-answering (QA) benchmark for spatial reasoning on natural language text which contains more realistic spatial phenomena not covered by prior work and is challenging for state-of-the-art language models (LM). We propose a distant supervision method to improve on this task. Specifically, we design grammar and reasoning rules to automatically generate a spatial description of visual scenes and corresponding QA pairs. Experiments show that further pretraining LMs on these automatically generated data significantly improves LMs' capability on spatial understanding, which in turn helps to better solve two external datasets, bAbI, and boolQ. We hope that this work can foster investigations into more sophisticated models for spatial reasoning over text.
Learnability with Indirect Supervision Signals
Wang, Kaifu, Ning, Qiang, Roth, Dan
Learning from indirect supervision signals is important in real-world AI applications when, often, gold labels are missing or too costly. In this paper, we develop a unified theoretical framework for multi-class classification when the supervision is provided by a variable that contains nonzero mutual information with the gold label. The nature of this problem is determined by (i) the transition probability from the gold labels to the indirect supervision variables and (ii) the learner's prior knowledge about the transition. Our framework relaxes assumptions made in the literature, and supports learning with unknown, non-invertible and instance-dependent transitions. Our theory introduces a novel concept called \emph{separation}, which characterizes the learnability and generalization bounds. We also demonstrate the application of our framework via concrete novel results in a variety of learning scenarios such as learning with superset annotations and joint supervision signals.
Foreshadowing the Benefits of Incidental Supervision
He, Hangfeng, Zhang, Mingyuan, Ning, Qiang, Roth, Dan
Learning theory mostly addresses the standard learning paradigm, assuming the availability of complete and correct supervision signals for large amounts of data. However, in practice, machine learning researchers and practitioners acquire and make use of a range of {\em incidental supervision} signals that only have statistical associations with the gold supervision. This paper addresses the question: {\em Can one quantify models' performance when learning with such supervision signals, without going through an exhaustive experimentation process with various supervision signals and learning protocols?} To quantify the benefits of various incidental supervision signals, we propose a unified PAC-Bayesian Informativeness measure (PABI), characterizing the reduction in uncertainty that incidental supervision signals provide. We then demonstrate PABI's use in quantifying various types of incidental signals such as partial labels, noisy labels, constraints, cross-domain signals, and some combinations of these. Experiments on named entity recognition and question answering show that PABI correlates well with learning performance, providing a promising way to determine, ahead of learning, which supervision signals would be beneficial.
Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction
Han, Rujun, Ning, Qiang, Peng, Nanyun
The task can be modeled as building a graph for a given text, whose nodes represent events and edges are labeled with temporal relations correspondingly. Figure 1a illustrates such a graph for the text shown therein. The nodes assassination, slaughtered, rampage, war, and Hutu are the candidate events, and different types of edges specify different temporal relations between them: assassination is BEFORE rampage, rampage INCLUDES slaughtered, and the relation between slaughtered and war is VAGUE. Since "Hutu" is actually not an event, a system is expected to annotate the relations between "Hutu" and all other nodes in the graph as NONE (i.e., no relation). As far as we know, all existing systems treat this task as a pipeline of two separate subtasks, (a) Temporal Relation Graph (b) Pipeline Model (c) Structured Joint Model Figure 1: An illustration of event and relation models in our proposed joint framework.
Incidental Supervision from Question-Answering Signals
He, Hangfeng, Ning, Qiang, Roth, Dan
Human annotations are costly for many natural language processing (NLP) tasks, especially for those requiring NLP expertise. One promising solution is to use natural language to annotate natural language. However, it remains an open problem how to get supervision signals or learn representations from natural language annotations. This paper studies the case where the annotations are in the format of question-answering (QA) and proposes an effective way to learn useful representations for other tasks. We also find that the representation retrieved from question-answer meaning representation (QAMR) data can almost universally improve on a wide range of tasks, suggesting that such kind of natural language annotations indeed provide unique information on top of modern language models.