ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning

Oct-3-2025–arXiv.org Artificial Intelligence

While Transformer architectures have demonstrated impressive scalability across domains, they continue to face challenges in long-context reasoning, computational efficiency, and structural generalization - largely due to rigid layer stacking, dense attention, and reliance on positional encodings. We present ReSSFormer, a Recursive Sparse Structured Transformer that integrates three complementary innovations: Recurrent Reasoning & Memory Unit (R2MU) for iterative reasoning with bounded depth, Adaptive Sparse Attention Module (ASAM) for efficient and focused context selection, and Self-Organizing Encoder Structure (SOES) for position-free structure induction. ReSSFormer replaces conventional depth stacking with recurrent inference, substitutes full attention with token- and expert-level sparsity, and models latent token topology directly from content. Across language modeling, multi-hop QA, and structure-sensitive tasks, ReSSFormer consistently outperforms strong baselines under comparable FLOPs and parameter budgets, highlighting its scalability, efficiency, and structural flexibility.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Oct-3-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China > Hebei Province
    - Shijiazhuang (0.04)
  - Malaysia > Kuala Lumpur
    - Kuala Lumpur (0.05)
  - Middle East > Jordan (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.89)
  - Natural Language > Large Language Model (1.00)