Gupta, Vivek
Evaluating Inter-Bilingual Semantic Parsing for Indian Languages
Aggarwal, Divyanshu, Gupta, Vivek, Kunchukuttan, Anoop
Despite significant progress in Natural Language Generation for Indian languages (IndicNLP), there is a lack of datasets around complex structured tasks such as semantic parsing. One reason for this imminent gap is the complexity of the logical form, which makes English to multilingual translation difficult. The process involves alignment of logical forms, intents and slots with translated unstructured utterance. To address this, we propose an Inter-bilingual Seq2seq Semantic parsing dataset IE-SEMPARSE for 11 distinct Indian languages. We highlight the proposed task's practicality, and evaluate existing multilingual seq2seq models across several train-test strategies. Our experiment reveals a high correlation across performance of original multilingual semantic parsing datasets (such as mTOP, multilingual TOP and multiATIS++) and our proposed IE-SEMPARSE suite.
Leveraging Data Recasting to Enhance Tabular Reasoning
Jena, Aashna, Gupta, Vivek, Shrivastava, Manish, Eisenschlos, Julian Martin
Creating challenging tabular inference data is essential for learning complex reasoning. Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is difficult to scale. The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness. In this research, we present a framework for semi-automatically recasting existing tabular data to make use of the benefits of both approaches. We utilize our framework to build tabular NLI instances from five datasets that were initially intended for tasks like table2text creation, tabular Q/A, and semantic parsing. We demonstrate that recasted data could be used as evaluation benchmarks as well as augmentation data to enhance performance on tabular NLI tasks. Furthermore, we investigate the effectiveness of models trained on recasted data in the zero-shot scenario, and analyse trends in performance across different recasted datasets types.
SLATE: A Sequence Labeling Approach for Task Extraction from Free-form Inked Content
Gandhi, Apurva, Serrao, Ryan, Fang, Biyi, Antonius, Gilbert, Hong, Jenna, Nguyen, Tra My, Yi, Sheng, Nosakhare, Ehi, Shaffer, Irene, Srinivasan, Soundararajan, Gupta, Vivek
We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or "inked") notes on a virtual whiteboard. Our approach allows us to create a single, low-latency model to simultaneously perform sentence segmentation and classification of these sentences into task/non-task sentences. SLATE greatly outperforms a baseline two-model (sentence segmentation followed by classification model) approach, achieving a task F1 score of 84.4%, a sentence segmentation (boundary similarity) score of 88.4% and three times lower latency compared to the baseline. Furthermore, we provide insights into tackling challenges of performing NLP on the inking domain. We release both our code and dataset for this novel task.
Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
Asada, Yuki, Fu, Victor, Gandhi, Apurva, Gemawat, Advitya, Zhang, Lihao, He, Dong, Gupta, Vivek, Nosakhare, Ehi, Banda, Dalitso, Sen, Rathijit, Interlandi, Matteo
We demonstrate Tensor Query Processor (TQP): a query processor that automatically compiles relational operators into tensor programs. By leveraging tensor runtimes such as PyTorch, TQP is able to: (1) integrate with ML tools (e.g., Pandas for data ingestion, Tensorboard for visualization); (2) target different hardware (e.g., CPU, GPU) and software (e.g., browser) backends; and (3) end-to-end accelerate queries containing both relational and ML operators. TQP is generic enough to support the TPC-H benchmark, and it provides performance that is comparable to, and often better than, that of specialized CPU and GPU query processors.
TabPert: An Effective Platform for Tabular Perturbation
Jain, Nupur, Gupta, Vivek, Rai, Anshul, Kumar, Gaurav
To truly grasp reasoning ability, a Natural Language Inference model should be evaluated on counterfactual data. TabPert facilitates this by assisting in the generation of such counterfactual data for assessing model tabular reasoning issues. TabPert allows a user to update a table, change its associated hypotheses, change their labels, and highlight rows that are important for hypothesis classification. TabPert also captures information about the techniques used to automatically produce the table, as well as the strategies employed to generate the challenging hypotheses. These counterfactual tables and hypotheses, as well as the metadata, can then be used to explore an existing model's shortcomings methodically and quantitatively.
Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning
Gupta, Vivek, Bhat, Riyaz A., Ghosal, Atreya, Srivastava, Manish, Singh, Maneesh, Srikumar, Vivek
While neural models routinely report state-of-the-art performance across NLP tasks involving reasoning, their outputs are often observed to not properly use and reason on the evidence presented to them in the inputs. A model that reasons properly is expected to attend to the right parts of the input, be self-consistent in its predictions across examples, avoid spurious patterns in inputs, and to ignore biasing from its underlying pre-trained language model in a nuanced, context-sensitive fashion (e.g. handling counterfactuals). Do today's models do so? In this paper, we study this question using the problem of reasoning on tabular data. The tabular nature of the input is particularly suited for the study as it admits systematic probes targeting the properties listed above. Our experiments demonstrate that a BERT-based model representative of today's state-of-the-art fails to properly reason on the following counts: it often (a) misses the relevant evidence, (b) suffers from hypothesis and knowledge biases, and, (c) relies on annotation artifacts and knowledge from pre-trained language models as primary evidence rather than relying on reasoning on the premises in the tabular input.
Incorporating External Knowledge to Enhance Tabular Reasoning
Neeraja, J., Gupta, Vivek, Srikumar, Vivek
Reasoning about tabular information presents unique challenges to modern NLP approaches which largely rely on pre-trained contextualized embeddings of text. In this paper, we study these challenges through the problem of tabular natural language inference. We propose easy and effective modifications to how information is presented to a model for this task. We show via systematic experiments that these strategies substantially improve tabular inference performance.
On Long-Tailed Phenomena in Neural Machine Translation
Raunak, Vikas, Dalmia, Siddharth, Gupta, Vivek, Metze, Florian
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the generation of low-frequency words. We have released the code to reproduce our results.
INFOTABS: Inference on Tables as Semi-structured Data
Gupta, Vivek, Mehta, Maitrey, Nokhiz, Pegah, Srikumar, Vivek
In this paper, we observe that semi-structured tabulated text is ubiquitous; understanding them requires not only comprehending the meaning of text fragments, but also implicit relationships between them. We argue that such data can prove as a testing ground for understanding how we reason about information. To study this, we introduce a new dataset called INFOTABS, comprising of human-written textual hypotheses based on premises that are tables extracted from Wikipedia info-boxes. Our analysis shows that the semi-structured, multi-domain and heterogeneous nature of the premises admits complex, multi-faceted reasoning. Experiments reveal that, while human annotators agree on the relationships between a table-hypothesis pair, several standard modeling strategies are unsuccessful at the task, suggesting that reasoning about tables can pose a difficult modeling challenge.
A Logic-Driven Framework for Consistency of Neural Models
Li, Tao, Gupta, Vivek, Mehta, Maitrey, Srikumar, Vivek
Consequently, we have seen progressively improving performances on benchmarks such as GLUE (Wang et al., 2018). But, are models really becoming better? We take the position that, while tracking performance on a leaderboard is necessary to characterize model quality, it is not sufficient. Reasoning about language requires that a system has the ability not only to draw correct inferences about textual inputs, but also to be consistent its beliefs across various inputs. To illustrate this notion of consistency, let us consider the task of natural language inference (NLI) which seeks to identify whether a premise entails, contradicts or is unrelated to a hypothesis (Dagan et al., 2013).