Grammars & Parsing
Low-Rank Constraints for Fast Inference in Structured Models
Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.
Compositional Generalization via Neural-Symbolic Stack Machines
Despite achieving tremendous success, existing deep learning models have exposed limitations in compositional generalization, the capability to learn compositional rules and apply them to unseen cases in a systematic manner. To tackle this issue, we propose the Neural-Symbolic Stack Machine (NeSS). It contains a neural network to generate traces, which are then executed by a symbolic stack machine enhanced with sequence manipulation operations. NeSS combines the expressive power of neural sequence models with the recursion supported by the symbolic stack machine. Without training supervision on execution traces, NeSS achieves 100% generalization performance in four domains: the SCAN benchmark of language-driven navigation tasks, the task of few-shot learning of compositional instructions, the compositional machine translation benchmark, and context-free grammar parsing tasks.
Systematic Generalization with Edge Transformers
Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes---as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing.
Reviews: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing
This paper describes a Reinforcement Learning algorithm adapted to settings with sparse reward and weak supervision, and applies it to program synthesis, achieving state-of-the-art and even outperforming baselines with full supervision. The two first sections explain very clearly the motivation of this work, presenting the current limitations of reinforcement learning for tasks like contextual program synthesis. It is nicely written and pleasant to read. Section 3 presents the Reinforcement Learning framework that is the basis of the proposal, where the goal is to find a food approximation of the expected return objective. Section 4 presents the MAPO algorithm and his three key points: "(1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover new high reward trajectories" (I did not find a better phrasing to summarize than the one in the abstract and the conclusion).
CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages
Ray, Pretam, Sandhan, Jivnesh, Krishna, Amrith, Goyal, Pawan
Neural dependency parsing has achieved remarkable performance for low resource morphologically rich languages. It has also been well-studied that morphologically rich languages exhibit relatively free word order. This prompts a fundamental investigation: Is there a way to enhance dependency parsing performance, making the model robust to word order variations utilizing the relatively free word order nature of morphologically rich languages? In this work, we examine the robustness of graph-based parsing architectures on 7 relatively free word order languages. We focus on scrutinizing essential modifications such as data augmentation and the removal of position encoding required to adapt these architectures accordingly. To this end, we propose a contrastive self-supervised learning method to make the model robust to word order variations. Furthermore, our proposed modification demonstrates a substantial average gain of 3.03/2.95 points in 7 relatively free word order languages, as measured by the UAS/LAS Score metric when compared to the best performing baseline.
Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards.
Submodular Field Grammars: Representation, Inference, and Application to Image Parsing
Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part. Unfortunately, in contrast to language grammars, where the number of possible split points for a production A \rightarrow BC is linear in the length of A, in an image there are an exponential number of ways to split a region into subregions. This makes parsing intractable and requires image grammars to be severely restricted in practice, for example by allowing only rectangular regions. In this paper, we address this problem by associating with each production a submodular Markov random field whose labels are the subparts and whose labeling segments the current object into these subparts. We call the result a submodular field grammar (SFG).
Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems
As machine learning becomes more widely used in practice, we need new methods to build complex intelligent systems that integrate learning with existing software, and with domain knowledge encoded as rules. As a case study, we present such a system that learns to parse Newtonian physics problems in textbooks. This system, Nuts&Bolts, learns a pipeline process that incorporates existing code, pre-learned machine learning models, and human engineered rules. It jointly trains the entire pipeline to prevent propagation of errors, using a combination of labelled and unlabelled data. Our approach achieves a good performance on the parsing task, outperforming the simple pipeline and its variants.
Reviews: Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base
This paper proposes a semantic parsing method for dialog-based QA over a large-scale knowledge base. The method significantly outperforms the existing state of the art on CSQA, a recently-released conversational QA dataset. One of the major novelties of this paper is breaking apart the logical forms in the dialog history into smaller subsequences, any of which can be copied over into the logical form for the current question. While I do have some concerns with the method and the writing (detailed below), overall I liked this paper and I think that some of the ideas within it could be useful more broadly for QA researchers. Detailed comments: - I found many parts of the paper to be confusing, requiring multiple reads to fully understand.
Reviews: Submodular Field Grammars: Representation, Inference, and Application to Image Parsing
The key problem is that splitting the image into *arbitrarily-shaped* pixel regions to associate with the production rules is computationally difficult in general. This paper proposes to associate formal grammar production rules with submodular Markov random fields (MRF). The submodular structure of the associated MRF allows for fast inference for a single rule into arbitrarily-shaped subregions and a dynamic-programming-like algorithm for parsing the entire image structure. The experimental results show that the method is indeed much faster than previous methods. Pros: 1) Well-written and easy to read even though some of the details are fairly technical.