Recursive Transformer Boosting Reasoning Ability with State Stack

Jun-19-2026, 06:13:17 GMT–Neural Information Processing Systems

The Transformer architecture has emerged as a landmark advancement within the broad field of artificial intelligence, effectively catalyzing the advent of large language models (LLMs). However, despite its remarkable capabilities and the substantial progress it has facilitated, the Transformer architecture still has some limitations. One such intrinsic limitation is its inability to effectively recognize regular expressions or deterministic context-free grammars. Standard Transformers lack an explicit mechanism for recursion and structured state transitions, which can hinder systematic generalization on nested and hierarchical patterns. Drawing inspiration from pushdown automata, which efficiently resolve deterministic context-free grammars using stacks, we equip layers with a differentiable stack and propose STACKTRANS with recursion to address the aforementioned issue within LLMs. Unlike previous approaches that modify the attention computation, STACKTRANS explicitly incorporates hidden state stacks between Transformer layers. This design maintains compatibility with existing frameworks like flash-attention. Specifically, our design features stack operations - such as pushing and popping hidden states - that are differentiable and can be learned in an end-to-end manner.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-19-2026, 06:13:17 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found