Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Luo, Hongyin, Morgan, Nathaniel, Li, Tina, Zhao, Derek, Ngo, Ai Vy, Schroeder, Philip, Yang, Lijie, Ben-Kish, Assaf, O'Brien, Jack, Glass, James
–arXiv.org Artificial Intelligence
Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive sub-tasks, and conclusions based on the concept we proposed in (Schroeder et al., 2025). During generation, we maintain a working memory that retains only the key/value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use. Large language models (LLMs) have emerged as versatile foundations for a wide range of AI applications, especially agents which handle complicated tasks including multi-hop reasoning and tool use. Their ability to generalize across various tasks with minimal fine-tuning has driven rapid innovation and broad adoption (Brown et al., 2020). However, the fundamental objective of language modeling, to generate unstructured token sequences (Bengio et al., 2003), imposes strict context window limits and makes fine-grained control over internal state difficult. As a result, these inherent constraints pose significant challenges for all state-of-the-art LLMs, notably their inability to maintain long-horizon reasoning trajectories and coordinate complex workflows, which hinders the development of robust, memory-intensive applications. Neural networks generate natural language as a linear sequence. Recurrent neural networks (RNNs) (Mikolov et al., 2010; Luong et al., 2015; Gu & Dao, 2023) and Transformers (V aswani et al., 2017; Y ang et al., 2023) are constrained by token limits, hidden state sizes, and GPU-memory capacities.
arXiv.org Artificial Intelligence
Jul-23-2025
- Country:
- North America > United States (0.28)
- Asia (0.28)
- Genre:
- Research Report > New Finding (1.00)
- Technology: