AITopics | hard-attention transformer

Collaborating Authors

hard-attention transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

13d7f172259b11b230cc5da8768abc5f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 00:51:08 GMT

hard-attention transformer, opération, transformer, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Software (0.66)

Add feedback

From Next Token Prediction to (STRIPS) World Models -- Preliminary Results

Núñez-Molina, Carlos, Gómez, Vicenç, Geffner, Hector

arXiv.org Artificial IntelligenceOct-21-2025

We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.

artificial intelligence, machine learning, transformer, (18 more...)

arXiv.org Artificial Intelligence

2509.13389

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages Andy Y ang University of Notre Dame David Chiang University of Notre Dame Dana Angluin Y ale University

Neural Information Processing SystemsOct-9-2025, 19:01:24 GMT

A key technique in these proofs is the use of B-RASP, which, like RASP (Weiss et al., 2021), is a small programming language that compiles into transformers.

hard-attention transformer, opération, transformer, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Software (0.66)

Add feedback

Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages

Angluin, Dana, Chiang, David, Yang, Andy

arXiv.org Artificial IntelligenceJan-17-2024

We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.

predicate, transformer, vector, (15 more...)

arXiv.org Artificial Intelligence

2310.13897

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback