Goto

Collaborating Authors

 Grammars & Parsing


Compositional Generalization Across Distributional Shifts with Sparse Tree Operations

Neural Information Processing Systems

Neural networks continue to struggle with compositional generalization, and this issue is exacerbated by a lack of massive pre-training. One successful approach for developing neural systems which exhibit human-like compositional generalization is hybrid neurosymbolic techniques. However, these techniques run into the core issues that plague symbolic approaches to AI: scalability and flexibility. The reason for this failure is that at their core, hybrid neurosymbolic models perform symbolic computation and relegate the scalable and flexible neural computation to parameterizing a symbolic system. We investigate a unified neurosymbolic system where transformations in the network can be interpreted simultaneously as both symbolic and neural computation. We extend a unified neurosymbolic architecture called the Differentiable Tree Machine in two central ways. First, we significantly increase the model's efficiency through the use of sparse vector representations of symbolic structures. Second, we enable its application beyond the restricted set of tree2tree problems to the more general class of seq2seq problems. The improved model retains its prior generalization capabilities and, since there is a fully neural path through the network, avoids the pitfalls of other neurosymbolic techniques that elevate symbolic computation over neural computation.


Hierarchy-Agnostic Unsupervised Segmentation: Parsing Semantic Image Structure

Neural Information Processing Systems

Unsupervised semantic segmentation aims to discover groupings within images, capturing objects' view-invariance without external supervision. Moreover, this task is inherently ambiguous due to the varying levels of semantic granularity. Existing methods often bypass this ambiguity using dataset-specific priors. In our research, we address this ambiguity head-on and provide a universal tool for pixellevel semantic parsing of images guided by the latent representations encoded in self-supervised models. We introduce a novel algebraic approach that recursively decomposes an image into nested subgraphs, dynamically estimating their count and ensuring clear separation. The innovative approach identifies scene-specific primitives and constructs a hierarchy-agnostic tree of semantic regions from the image pixels.


Brain Treebank: Large-scale intracranial recordings from naturalistic language stimuli Christopher Wang 1, 2, Aaditya K Singh

Neural Information Processing Systems

We present the Brain Treebank, a large-scale dataset of electrophysiological neural responses, recorded from intracranial probes while 10 subjects watched one or more Hollywood movies. Subjects watched on average 2.6 Hollywood movies, for an average viewing time of 4.3 hours, and a total of 43 hours. The audio track for each movie was transcribed with manual corrections. Word onsets were manually annotated on spectrograms of the audio track for each movie. Each transcript was automatically parsed and manually corrected into the universal dependencies (UD) formalism, assigning a part of speech to every word and a dependency parse to every sentence. In total, subjects heard over 38,000 sentences (223,000 words), while they had on average 168 electrodes implanted. This is the largest dataset of intracranial recordings featuring grounded naturalistic language, one of the largest English UD treebanks in general, and one of only a few UD treebanks aligned to multimodal features. We hope that this dataset serves as a bridge between linguistic concepts, perception, and their neural representations. To that end, we present an analysis of which electrodes are sensitive to language features while also mapping out a rough time course of language processing across these electrodes.


Hierarchical Decision Making by Generating and Following Natural Language Instructions

Neural Information Processing Systems

We explore using natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales. We gather a dataset of 76 thousand pairs of instructions and executions from human play, and train instructor and executor models. Experiments show that models generate intermediate plans in natural langauge significantly outperform models that directly imitate human actions. The compositional structure of language is conducive to learning generalizable action representations.


A Benchmark for Parsing Ambiguous Questions into Database Queries

Neural Information Processing Systems

Practical semantic parsers are expected to understand user utterances and map them to executable programs, even when these are ambiguous. We introduce a new benchmark,, which we hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests. Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. In each case, the ambiguity persists even when the database context is provided. This is achieved through a novel approach that involves controlled generation of databases from scratch. We benchmark various LLMs on, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.


Glyce: Glyph-vectors for Chinese Character Representations

Neural Information Processing Systems

It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures (called tianzege-CNN) tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model's ability to generalize. We show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. We are able to set new stateof-the-art results for a variety of Chinese NLP tasks, including tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks, dependency parsing, and semantic role labeling. For example, the proposed model achieves an F1 score of 80.6 on the OntoNotes dataset of NER, +1.5 over BERT; it achieves an almost perfect accuracy of 99.8% on the Fudan corpus for text classification.


Towards a theory of how the structure of language is acquired by deep neural networks

Neural Information Processing Systems

How much data is required to learn the structure of a language via next-token prediction? We study this question for synthetic datasets generated via a Probabilistic Context-Free Grammar (PCFG)--a tree-like generative model that captures many of the hierarchical structures found in natural languages. We determine token-token correlations analytically in our model and show that they can be used to build a representation of the grammar's hidden variables, the longer the range the deeper the variable. In addition, a finite training set limits the resolution of correlations to an effective range, whose size grows with that of the training set. As a result, a Language Model trained with increasingly many examples can build a deeper representation of the grammar's structure, thus reaching good performance despite the high dimensionality of the problem. We conjecture that the relationship between training set size and effective range of correlations holds beyond our synthetic datasets. In particular, our conjecture predicts how the scaling law for the test loss behaviour with training set size depends on the length of the context window, which we confirm empirically in Shakespeare's plays and Wikipedia articles.


A Dynamic Programs For SSK Evaluations and Gradients We now detail recursive calculation strategies for calculating k (a,b) and its gradients with O(nl

Neural Information Processing Systems

A recursive strategy is able to efficiently calculate the contributions of particular substring, pre-calculating contributions of the smaller sub-strings contained within the target string. Context-free grammars (CFG) are 4-tuples G = (V, ฮฃ, R, S), consisting of: a set of non-terminal symbols V, a set of terminal symbols ฮฃ (also known as an alphabet), a set of production rules R, a non-terminal starting symbol S from which all strings are generated. Production rules are simple maps permitting the swapping of non-terminals with other non-terminals or terminals. All strings generated by the CFG can be broken down into a (non-unique) tree of production rules with the non-terminal starting symbol S at its head. These are known as the parse trees and are demonstrated in Figure 3 in the main paper.


Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency

Neural Information Processing Systems

This paper proposes a human parsing based texture transfer model via cross-view consistency learning to generate the texture of 3D human body from a single image. We use the semantic parsing of human body as input for providing both the shape and pose information to reduce the appearance variation of human image and preserve the spatial distribution of semantic parts. Meanwhile, in order to improve the prediction for textures of invisible parts, we explicitly enforce the consistency across different views of the same subject by exchanging the textures predicted by two views to render images during training. The perceptual loss and total variation regularization are optimized to maximize the similarity between rendered and input images, which does not necessitate extra 3D texture supervision. Experimental results on pedestrian images and fashion photos demonstrate that our method can produce higher quality textures with convincing details than other texture generation methods.


43f5f6c5cb333115914c8448b8506411-Supplemental-Conference.pdf

Neural Information Processing Systems

Return is obj to pass to sys.exit() directly Noisy bandwidths are inaccurate, as we don't account for parallel transfers here Table 5: Example queries that were not included due to query parsing errors