smolensky
A polar coordinate system represents syntax in large language models
Diego-Simón, Pablo, D'Ascoli, Stéphane, Chemla, Emmanuel, Lakretz, Yair, King, Jean-Rémi
Originally formalized with symbolic representations, syntactic trees may also be effectively represented in the activations of large language models (LLMs). Indeed, a 'Structural Probe' can find a subspace of neural activations, where syntactically related words are relatively close to one-another. However, this syntactic code remains incomplete: the distance between the Structural Probe word embeddings can represent the existence but not the type and direction of syntactic relations. Here, we hypothesize that syntactic relations are, in fact, coded by the relative direction between nearby embeddings. To test this hypothesis, we introduce a 'Polar Probe' trained to read syntactic relations from both the distance and the direction between word embeddings. Our approach reveals three main findings. First, our Polar Probe successfully recovers the type and direction of syntactic relations, and substantially outperforms the Structural Probe by nearly two folds. Second, we confirm that this polar coordinate system exists in a low-dimensional subspace of the intermediate layers of many LLMs and becomes increasingly precise in the latest frontier models. Third, we demonstrate with a new benchmark that similar syntactic relations are coded similarly across the nested levels of syntactic trees. Overall, this work shows that LLMs spontaneously learn a geometry of neural activations that explicitly represents the main symbolic structures of linguistic theory.
Harmony Networks Do Not Work
Harmony networks have been proposed as a means by which con(cid:173) nectionist models can perform symbolic computation. Indeed, pro(cid:173) ponents claim that a harmony network can be built that constructs parse trees for strings in a context free language. This paper shows that harmony networks do not work in the following sense: they construct many outputs that are not valid parse trees. In order to show that the notion of systematicity is compatible with connectionism, Paul Smolensky, Geraldine Legendre and Yoshiro Miyata (Smolensky, Legendre, and Miyata 1992; Smolen sky 1993; Smolen sky, Legendre, and Miyata 1994) pro(cid:173) posed a mechanism, "Harmony Theory," by which connectionist models purportedly perform structure sensitive operations without implementing classical algorithms. Harmony theory describes a "harmony network" which, in the course of reaching a stable equilibrium, apparently computes parse trees that are valid according to the rules of a particular context-free grammar.
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
Lewis, Martha, Nayak, Nihal V., Yu, Peilin, Yu, Qinan, Merullo, Jack, Bach, Stephen H., Pavlick, Ellie
Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pretrained vision and language model (CLIP) to encode compositional concepts and to bind variables in a structure-sensitive way (e.g., differentiating ''cube behind sphere'' from ''sphere behind cube''). In order to inspect the performance of CLIP, we compare several architectures from research on compositional distributional semantics models (CDSMs), a line of research that attempts to implement traditional compositional linguistic structures within embedding spaces. We find that CLIP can compose concepts in a single-object setting, but in situations where concept binding is needed, performance drops dramatically. At the same time, CDSMs also perform poorly, with best performance at chance level.
Should Semantic Vector Composition be Explicit? Can it be Linear?
Widdows, Dominic, Howell, Kristen, Cohen, Trevor
Vector representations have become a central element in semantic language modelling, leading to mathematical overlaps with many fields including quantum theory. Compositionality is a core goal for such representations: given representations for 'wet' and 'fish', how should the concept 'wet fish' be represented? This position paper surveys this question from two points of view. The first considers the question of whether an explicit mathematical representation can be successful using only tools from within linear algebra, or whether other mathematical tools are needed. The second considers whether semantic vector composition should be explicitly described mathematically, or whether it can be a model-internal side-effect of training a neural network. A third and newer question is whether a compositional model can be implemented on a quantum computer. Given the fundamentally linear nature of quantum mechanics, we propose that these questions are related, and that this survey may help to highlight candidate operations for future quantum implementation.
Zyfra leveraging AI for bucket tooth, fragmentation detection and analysis - International Mining
Zyfra says it has developed an automated system using artificial intelligence (AI) to monitor the condition of excavator bucket teeth based on its machine vision BucketControl system. The system is designed to detect the presence or absence of excavator bucket crowns quickly and features functions to alert the excavator operator if a crown is lost or ceases to work. The application, developed jointly by the AI and Mining divisions of Zyfra, uses an on-board controller to acquire images from the camera, process and analyse them using internal software and sends a signal to the operator if a crown is lost or ceases to work. The wear of the tooth is also assessed, and when a critical value is reached, a notification is sent to the dispatcher, according to the company. This data is transmitted to the server in real time, Zyfra added.
Boosting Generative Models by Leveraging Cascaded Meta-Models
Deep generative models are effective methods of modeling data. However, it is not easy for a single generative model to faithfully capture the distributions of complex data such as images. In this paper, we propose an approach for boosting generative models, which cascades meta-models together to produce a stronger model. Any hidden variable meta-model (e.g., RBM and VAE) which supports likelihood evaluation can be leveraged. We derive a decomposable variational lower bound of the boosted model, which allows each meta-model to be trained separately and greedily. Besides, our framework can be extended to semi-supervised boosting, where the boosted model learns a joint distribution of data and labels. Finally, we combine our boosting framework with the multiplicative boosting framework, which further improves the learning power of generative models.
Learning to Reason with Third Order Tensor Products
Schlag, Imanol, Schmidhuber, Jürgen
We combine Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data. This improves symbolic interpretation and systematic generalisation. Our architecture is trained end-to-end through gradient descent on a variety of simple natural language reasoning tasks, significantly outperforming the latest state-of-the-art models in single-task and all-tasks settings. We also augment a subset of the data such that training and test data exhibit large systematic differences and show that our approach generalises better than the previous state-of-the-art.