Goto

Collaborating Authors

 Grammars & Parsing


Reviews: Predicting Scene Parsing and Motion Dynamics in the Future

Neural Information Processing Systems

The paper proposes a deep-learning-based approach to joint prediction of future optical flow and semantic segmentation in videos. The authors evaluate the approach in a driving scenario and show that the two components - flow prediction and semantic segmentation prediction - benefit from each other. The paper is related to works of Jin et al. and Neverova et al. However, as far as I understand, both of these have not been officially published at the time of submission (and the work of Neverova et al. Detailed comment: Pros: 1) The idea seems sound: predicting segmentation and optical flow are both important tasks, and they should be mutually beneficial.


Parameter Choice and Neuro-Symbolic Approaches for Deep Domain-Invariant Learning

arXiv.org Artificial Intelligence

As artificial intelligence (AI) systems advance, we move towards broad AI: systems capable of performing well on diverse tasks, understanding context, and adapting rapidly to new scenarios. A central challenge for broad AI systems is to generalize over tasks in related domains and being robust to distribution shifts. Neuro-symbolic (NeSy) AI bridges the gap between symbolic and sub-symbolic paradigms to address these challenges, enabling adaptable, generalizable, and more interpretable systems. The development of broad AI requires advancements in domain adaptation (DA), enabling models trained on source domains to effectively generalize to unseen target domains. Traditional approaches often rely on parameter optimization and fine-tuning, which can be impractical due to high costs and risks of catastrophic forgetting. NeSy AI systems use multiple models and methods to generalize to unseen domains and maintain performance across varying conditions. We analyze common DA and NeSy approaches with a focus on deep domain-invariant learning, extending to real-world challenges such as adapting to continuously changing domains and handling large domain gaps. We showcase state-of-the-art model-selection methods for scenarios with limited samples and introduce domain-specific adaptations without gradient-based updates for cases where model tuning is infeasible. This work establishes a framework for scalable and generalizable broad AI systems applicable across various problem settings, demonstrating how symbolic reasoning and large language models can build universal computational graphs that generalize across domains and problems, contributing to more adaptable AI approaches for real-world applications.


Reviews: Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems

Neural Information Processing Systems

The main idea is the use of PSL (probabilistic soft logic) as a framework to map partial estimates from multiple feedforward algorithms, along with domain specific logical rules, to parse visual diagrams from physics texts. Specifically, the pipelines use feature extractors for lines, arcs, corners, text elements, object elements (e.g.blocks in physics diagrams). These are combined along with human specified rules for groupings, high-level elements, text/figure labeling schemes along with the inference engine to produce the parse into a formal logical language. Experiments illustrate how the learned system: 1) is superior to state of the art diagram parsing scheme, 2) can utilize labelled as well as unlabelled data to achieve improved performance, 3) can handle various degrees of supervision in different parts of the pipeline and is robust, and 4) through integrative modeling of the stages in pipeline prevents error propagation. Quality, Clarity, originality, significance of the paper: The paper is well written and has extensive references to relevant literature, adequate experimentation.


Progressive distillation induces an implicit curriculum

arXiv.org Artificial Intelligence

Knowledge distillation leverages a teacher model to improve the training of a student model. A persistent challenge is that a better teacher does not always yield a better student, to which a common mitigation is to use additional supervision from several ``intermediate'' teachers. One empirically validated variant of this principle is progressive distillation, where the student learns from successive intermediate checkpoints of the teacher. Using sparse parity as a sandbox, we identify an implicit curriculum as one mechanism through which progressive distillation accelerates the student's learning. This curriculum is available only through the intermediate checkpoints but not the final converged one, and imparts both empirical acceleration and a provable sample complexity benefit to the student. We then extend our investigation to Transformers trained on probabilistic context-free grammars (PCFGs) and real-world pre-training datasets (Wikipedia and Books). Through probing the teacher model, we identify an analogous implicit curriculum where the model progressively learns features that capture longer context. Our theoretical and empirical findings on sparse parity, complemented by empirical observations on more complex tasks, highlight the benefit of progressive distillation via implicit curriculum across setups.


Leveraging Grammar Induction for Language Understanding and Generation

arXiv.org Artificial Intelligence

Grammar induction has made significant progress in recent years. However, it is not clear how the application of induced grammar could enhance practical performance in downstream tasks. In this work, we introduce an unsupervised grammar induction method for language understanding and generation. We construct a grammar parser to induce constituency structures and dependency relations, which is simultaneously trained on downstream tasks without additional syntax annotations. The induced grammar features are subsequently incorporated into Transformer as a syntactic mask to guide self-attention. We evaluate and apply our method to multiple machine translation tasks and natural language understanding tasks. Our method demonstrates superior performance compared to the original Transformer and other models enhanced with external parsers. Experimental results indicate that our method is effective in both from-scratch and pre-trained scenarios. Additionally, our research highlights the contribution of explicitly modeling the grammatical structure of texts to neural network models.


On Eliciting Syntax from Language Models via Hashing

arXiv.org Artificial Intelligence

Unsupervised parsing, also known as grammar induction, aims to infer syntactic structure from raw text. Recently, binary representation has exhibited remarkable information-preserving capabilities at both lexicon and syntax levels. In this paper, we explore the possibility of leveraging this capability to deduce parsing trees from raw text, relying solely on the implicitly induced grammars within models. To achieve this, we upgrade the bit-level CKY from zero-order to first-order to encode the lexicon and syntax in a unified binary representation space, switch training from supervised to unsupervised under the contrastive hashing framework, and introduce a novel loss function to impose stronger yet balanced alignment signals. Our model shows competitive performance on various datasets, therefore, we claim that our method is effective and efficient enough to acquire high-quality parsing trees from pre-trained language models at a low cost.


On Uncertainty In Natural Language Processing

arXiv.org Artificial Intelligence

The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-facing applications. In order to reap the benefits of this technology and reduce potential harms, it is important to quantify the reliability of model predictions and the uncertainties that shroud their development. This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective, and how it can be reduced and quantified through the design of the experimental pipeline. We further explore uncertainty quantification in modeling by theoretically and empirically investigating the effect of inductive model biases in text classification tasks. The corresponding experiments include data for three different languages (Danish, English and Finnish) and tasks as well as a large set of different uncertainty quantification approaches. Additionally, we propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction, which provides tighter token sets with better coverage of the actual continuation. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors, where the confidence is predicted from the input to and generated output text of the target model alone.


Should Cross-Lingual AMR Parsing go Meta? An Empirical Assessment of Meta-Learning and Joint Learning AMR Parsing

arXiv.org Artificial Intelligence

Cross-lingual AMR parsing is the task of predicting AMR graphs in a target language when training data is available only in a source language. Due to the small size of AMR training data and evaluation data, cross-lingual AMR parsing has only been explored in a small set of languages such as English, Spanish, German, Chinese, and Italian. Taking inspiration from Langedijk et al. (2022), who apply meta-learning to tackle cross-lingual syntactic parsing, we investigate the use of meta-learning for cross-lingual AMR parsing. We evaluate our models in $k$-shot scenarios (including 0-shot) and assess their effectiveness in Croatian, Farsi, Korean, Chinese, and French. Notably, Korean and Croatian test sets are developed as part of our work, based on the existing The Little Prince English AMR corpus, and made publicly available. We empirically study our method by comparing it to classical joint learning. Our findings suggest that while the meta-learning model performs slightly better in 0-shot evaluation for certain languages, the performance gain is minimal or absent when $k$ is higher than 0.



IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?

arXiv.org Artificial Intelligence

Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on questions such as: Which linguistic properties are encoded by these models, and to what extent? How robust are these models in encoding linguistic properties when faced with perturbations in the input text? However, these studies have mainly focused on BERT and the English language. In this paper, we investigate similar questions regarding encoding capability and robustness for 8 linguistic properties across 13 different perturbations in 6 Indic languages, using 9 multilingual Transformer models (7 universal and 2 Indic-specific). To conduct this study, we introduce a novel multilingual benchmark dataset, IndicSentEval, containing approximately $\sim$47K sentences. Surprisingly, our probing analysis of surface, syntactic, and semantic properties reveals that while almost all multilingual models demonstrate consistent encoding performance for English, they show mixed results for Indic languages. As expected, Indic-specific multilingual models capture linguistic properties in Indic languages better than universal models. Intriguingly, universal models broadly exhibit better robustness compared to Indic-specific models, particularly under perturbations such as dropping both nouns and verbs, dropping only verbs, or keeping only nouns. Overall, this study provides valuable insights into probing and perturbation-specific strengths and weaknesses of popular multilingual Transformer-based models for different Indic languages. We make our code and dataset publicly available [https://tinyurl.com/IndicSentEval}].