Goto

Collaborating Authors

 Sartran, Laurent


SynJax: Structured Probability Distributions for JAX

arXiv.org Artificial Intelligence

The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to the vectorized computation. The models that explicitly account for structured objects, such as trees and segmentations, did not benefit equally because they require custom algorithms that are difficult to implement in a vectorized form. SynJax directly addresses this problem by providing an efficient vectorized implementation of inference algorithms for structured distributions covering alignment, tagging, segmentation, constituency trees and spanning trees. This is done by exploiting the connection between algorithms for automatic differentiation and probabilistic inference. With SynJax we can build large-scale differentiable models that explicitly model structure in the data. The code is available at https://github.com/google-deepmind/synjax


Measuring Progress in Fine-grained Vision-and-Language Understanding

arXiv.org Artificial Intelligence

First we consider: Which models perform well Fine-grained multimodal skills (e.g., understanding on fine-grained tasks? To answer this, we evaluate relationships and recognising verbs) require identifying models from four different model families trained and relating various entities across both image with different amounts of pretraining data, as well and text modalities. Vision-and-language models as recent architectures that leverage frozen large (VLMs) need such skills to robustly perform language models (LLMs). We observe that modelling well on real-world vision-and-language (V&L) applications; innovations have more impact than simply e.g., a coarse-grained model tested on scaling image captions from the Web. Furthermore, image retrieval to "find an image where something explicitly modelling localisation can improve is on a sofa" might incorrectly return an image of performance, but it is crucial how it is done, a cat sitting below the sofa. As another example, and simply using localisation data is not enough. in captioning, a model might incorrectly describe Our observations motivate our next question: an image where "someone is selling a sweater" as How do data and losses impact fine-grained understanding? "someone is buying a sweater," if it does not have a We focus our study on the best performing precise understanding of the two verbs.


Continuous diffusion for categorical data

arXiv.org Artificial Intelligence

Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.


Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

arXiv.org Artificial Intelligence

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism -- one that is independent of composed syntactic representations -- plays an important role in current successful models of long text.


Rapid Task-Solving in Novel Environments

arXiv.org Artificial Intelligence

When thrust into an unfamiliar environment and charged with solving a series of tasks, an effective agent should (1) leverage prior knowledge to solve its current task while (2) efficiently exploring to gather knowledge for use in future tasks, and then (3) plan using that knowledge when faced with new tasks in that same environment. We introduce two domains for conducting research on this challenge, and find that state-of-the-art deep reinforcement learning (RL) agents fail to plan in novel environments. We develop a recursive implicit planning module that operates over episodic memories, and show that the resulting deep-RL agent is able to explore and plan in novel environments, outperforming the nearest baseline by factors of 2-3 across the two domains. We find evidence that our module (1) learned to execute a sensible information-propagating algorithm and (2) generalizes to situations beyond its training experience.