Goto

Collaborating Authors

 hard-attention transformer



From Next Token Prediction to (STRIPS) World Models -- Preliminary Results

Núñez-Molina, Carlos, Gómez, Vicenç, Geffner, Hector

arXiv.org Artificial Intelligence

We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.



Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages

Angluin, Dana, Chiang, David, Yang, Andy

arXiv.org Artificial Intelligence

We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.