Goto

Collaborating Authors

 Grammars & Parsing


Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter

arXiv.org Artificial Intelligence

Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families and/or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we analyze cross-lingual transfer for 266 languages from a wide variety of language families. Moreover, we include three popular NLP tasks: POS tagging, dependency parsing, and topic classification. Our findings indicate that the effect of linguistic similarity on transfer performance depends on a range of factors: the NLP task, the (mono- or multilingual) input representations, and the definition of linguistic similarity.


Review for NeurIPS paper: Compositional Generalization via Neural-Symbolic Stack Machines

Neural Information Processing Systems

The paper proposes a new method for compositional generalization in sequence-to-sequence tasks. The basic idea is to have a symbolic stack machine (capable of compositionally manipulating sequences) that is controlled by a neural network. The method gets perfect accuracy on an existing compositional generalization dataset, a small-scale English-French machine translation task, and a grammar parsing task. Pros: Novel architecture Attractive way of providing inductive bias without hardcoding too much knowledge The paper is well-written Strong experimental results in the domains considered Cons: The paper could do more by the way of providing insights about why the model works. The reviewers appreciated the clarifications provided in the author feedback.


Reviews: Learning to Exploit Stability for 3D Scene Parsing

Neural Information Processing Systems

The goal of this paper is to output a set of 3D bounding boxes and set of dominant planes for a scene depicted in a single image. The key insight is to incorporate stability constraints in the 3D layout, i.e., the reconstructed 3D boxes should not move too far under simulation (in Bullet) with physical forces (gravity, friction). Parameters for 3D boxes are regressed using a modified R-CNN training loss and dominant planes for the walls and floors are regressed via a RNN. A stability criterion is used to update the output 3D scene (via REINFORCE) where the predicted 3D layout is run through Bullet simulator and 3D displacements are checked. Results are shown on synthetic (SUNCG, SceneNet RGB-D) and real (SUN RGB-D) datasets, out-performing the factored 3D approach of [Tulsiani18].


BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and Semantic Parsing

Neural Information Processing Systems

Recent work has shown that generation from a prompted or fine-tuned language model can perform well at semantic parsing when the output is constrained to be a valid semantic representation. We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing, that includes context-free grammars for seven semantic parsing datasets and two syntactic parsing datasets with varied output meaning representations, as well as a constrained decoding interface to generate only valid outputs covered by these grammars. We provide low, medium, and high resource splits for each dataset, allowing accurate comparison of various language models under different data regimes. Our benchmark supports evaluation of language models using prompt-based learning as well as fine-tuning. Our experiments show that encoder-decoder pretrained language models can achieve similar performance or even surpass state-of-the-art methods for both syntactic and semantic parsing when the model output is constrained to be valid.


Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Neural Information Processing Systems

We focus on the weakly-supervised audio-visual video parsing task (AVVP), which aims to identify and locate all the events in audio/visual modalities. Previous works only concentrate on video-level overall label denoising across modalities, but overlook the segment-level label noise, where adjacent video segments (i.e., 1-second video clips) may contain different events. However, recognizing events on the segment is challenging because its label could be any combination of events that occur in the video. To address this issue, we consider tackling AVVP from the language perspective, since language could freely describe how various events appear in each segment beyond fixed labels. Specifically, we design language prompts to describe all cases of event appearance for each video.


Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing

Neural Information Processing Systems

The audio-visual video parsing task aims to parse a video into modality- and category-aware temporal segments. Previous work mainly focuses on weakly-supervised approaches, which learn from video-level event labels. During training, they do not know which modality perceives and meanwhile which temporal segment contains the video event. Since there is no explicit grouping in the existing frameworks, the modality and temporal uncertainties make these methods suffer from false predictions. For instance, segments in the same category could be predicted in different event classes.


QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data

Neural Information Processing Systems

Table Representation Learning (TRL) models are commonly pre-trained on large open-domain datasets comprising millions of tables and then used to address downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and data quality. Our purpose is to support end-users in testing TRL models on proprietary data in two established SQL-centric tasks, i.e., Question Answering (QA) and Semantic Parsing (SP). We present QATCH (Query-Aided TRL Checklist), a toolbox to highlight TRL models' strengths and weaknesses on relational tables unseen at training time. For an input table, QATCH automatically generates a testing checklist tailored to QA and SP.


MOMA: Multi-Object Multi-Actor Activity Parsing

Neural Information Processing Systems

Complex activities often involve multiple humans utilizing different objects to complete actions (e.g., in healthcare settings, physicians, nurses, and patients interact with each other and various medical devices). Recognizing activities poses a challenge that requires a detailed understanding of actors' roles, objects' affordances, and their associated relationships. Furthermore, these purposeful activities are composed of multiple achievable steps, including sub-activities and atomic actions, which jointly define a hierarchy of action parts. This paper introduces Activity Parsing as the overarching task of temporal segmentation and classification of activities, sub-activities, atomic actions, along with an instance-level understanding of actors, objects, and their relationships in videos. Involving multiple entities (actors and objects), we argue that traditional pair-wise relationships, often used in scene or action graphs, do not appropriately represent the dynamics between them.


Natural Language Processing of Privacy Policies: A Survey

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) is an essential subset of artificial intelligence. It has become effective in several domains, such as healthcare, finance, and media, to identify perceptions, opinions, and misuse, among others. Privacy is no exception, and initiatives have been taken to address the challenges of usable privacy notifications to users with the help of NLP. To this aid, we conduct a literature review by analyzing 109 papers at the intersection of NLP and privacy policies. First, we provide a brief introduction to privacy policies and discuss various facets of associated problems, which necessitate the application of NLP to elevate the current state of privacy notices and disclosures to users. Subsequently, we a) provide an overview of the implementation and effectiveness of NLP approaches for better privacy policy communication; b) identify the methodologies that can be further enhanced to provide robust privacy policies; and c) identify the gaps in the current state-of-the-art research. Our systematic analysis reveals that several research papers focus on annotating and classifying privacy texts for analysis but need to adequately dwell on other aspects of NLP applications, such as summarization. More specifically, ample research opportunities exist in this domain, covering aspects such as corpus generation, summarization vectors, contextualized word embedding, identification of privacy-relevant statement categories, fine-grained classification, and domain-specific model tuning.


BBPOS: BERT-based Part-of-Speech Tagging for Uzbek

arXiv.org Artificial Intelligence

This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.