key-value pair
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Asia > Middle East > Jordan (0.04)
Benefits and Limitations of Communication in Multi-Agent Reasoning
Rizvi-Martel, Michael, Bhattamishra, Satwik, Rathi, Neil, Rabusseau, Guillaume, Hahn, Michael
Chain-of-thought prompting has popularized step-by-step reasoning in large language models, yet model performance still degrades as problem complexity and context length grow. By decomposing difficult tasks with long contexts into shorter, manageable ones, recent multi-agent paradigms offer a promising near-term solution to this problem. However, the fundamental capacities of such systems are poorly understood. In this work, we propose a theoretical framework to analyze the expressivity of multi-agent systems. We apply our framework to three algorithmic families: state tracking, recall, and $k$-hop reasoning. We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale. Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained. We complement our theoretical analysis with a set of experiments on pretrained LLMs using controlled synthetic benchmarks. Empirical outcomes confirm the tradeoffs between key quantities predicted by our theory. Collectively, our analysis offers principled guidance for designing scalable multi-agent reasoning systems.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (7 more...)
Revisiting associative recall in modern recurrent models
Okpekpe, Destiny, Orvieto, Antonio
Modern recurrent deep learning models - such as state-space models (SSMs) - have emerged as a promising computationally efficient alternative to Transformers for sequence modeling. However, how their practical differences in learn-ability and optimization impact core capabilities remains underexplored. In this paper, we thoroughly compare SSM and Transformer learning dynamics on two fundamental benchmarks highly correlated with language modeling performance: associative recall and copying. We find that, while Transformers are robust to optimization hyperparameters, the performance of modern recurrent models suffers from critical instabilities: success is confined to an extremely narrow window of learning rates, outside of which accuracy drastically drops. This issue can confound performance evaluations and expressivity conclusions, revealing a fundamental mismatch in the loss landscape of modern recurrent models compared to Transformers. We demonstrate that this brittle optimization has a direct impact on scaling, causing SSMs to favor width over depth. Indeed, we also find that, while the 1-layer Transformer's performance on recall does not exceed random guessing, well-tuned Mamba and other SSMs can learn to recall with one layer, yet with dynamics that do not resemble the formation of induction heads. Taken together, our findings suggest that a crucial differentiator between these architectures lies not just in their expressivity but in their fundamental learnability properties, pointing to optimization stability as a key challenge for the future of SSMs. Since early developments (Rumelhart et al., 1986; Elman, 1990), RNNs have driven progress in machine learning techniques for sequential data, with milestones such as Echo-State Networks (Jaeger, 2001) LSTM (Hochreiter & Schmidhuber, 1997) and GRU (Cho et al., 2014). However, two problems severely limit the application of RNNs in modern times: first, GPU architectures struggle with sequential processing. Secondly, it is widely known that RNNs are hard to train due to vanishing and exploding gradients issues (Bengio et al., 1994; Hochreiter et al., 2001; Pascanu et al., 2013). These challenges have led to the introduction of a different paradigm: the Attention mechanism, implemented around the Transformer architecture (V aswani et al., 2017). Instead of processing inputs sequentially while building up internal memory (RNNs), Attention computes pairwise interactions between data points, allowing for modeling direct links between elements in a sequence and thus mitigating vanishing gradients.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Asia > Middle East > Jordan (0.04)
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Zuo, Yifei, Yin, Yutong, Zeng, Zhichen, Li, Ang, Zhu, Banghua, Wang, Zhaoran
Transformer architectures have achieved remarkable success in various domains. While efficient alternatives to Softmax Attention have been widely studied, the search for more expressive mechanisms grounded in theoretical insight-even at greater computational cost-has been relatively underexplored. In this work, we bridge this gap by proposing Local Linear Attention (LLA), a novel attention mechanism derived from nonparametric statistics through the lens of test-time regression. First, we show that LLA offers theoretical advantages over Linear and Softmax Attention for associative memory via a bias-variance trade-off analysis. Next, we address its computational challenges and propose two memory-efficient primitives to tackle the $Θ(n^2 d)$ and $Θ(n d^2)$ complexity. We then introduce FlashLLA, a hardware-efficient, blockwise algorithm that enables scalable and parallel computation on modern accelerators. In addition, we implement and profile a customized inference kernel that significantly reduces memory overheads. Finally, we empirically validate the advantages and limitations of LLA on test-time regression, in-context regression, associative recall and state tracking tasks. Experiment results demonstrate that LLA effectively adapts to non-stationarity, outperforming strong baselines in test-time training and in-context learning, and exhibiting promising evidence for its scalability and applicability in large-scale models. Code is available at https://github.com/Yifei-Zuo/Flash-LLA.
The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale
Haider, Samar, Tohidi, Amir, Wang, Jenny S., Dörr, Timothy, Rothschild, David M., Callison-Burch, Chris, Watts, Duncan J.
Mainstream news organizations shape public perception not only directly through the articles they publish but also through the choices they make about which topics to cover (or ignore) and how to frame the issues they do decide to cover. However, measuring these subtle forms of media bias at scale remains a challenge. Here, we introduce a large, ongoing (from January 1, 2024 to present), near real-time dataset and computational framework developed to enable systematic study of selection and framing bias in news coverage. Our pipeline integrates large language models (LLMs) with scalable, near-real-time news scraping to extract structured annotations -- including political lean, tone, topics, article type, and major events -- across hundreds of articles per day. We quantify these dimensions of coverage at multiple levels -- the sentence level, the article level, and the publisher level -- expanding the ways in which researchers can analyze media bias in the modern news landscape. In addition to a curated dataset, we also release an interactive web platform for convenient exploration of these data. Together, these contributions establish a reusable methodology for studying media bias at scale, providing empirical resources for future research. Leveraging the breadth of the corpus over time and across publishers, we also present some examples (focused on the 150,000+ articles examined in 2024) that illustrate how this novel data set can reveal insightful patterns in news coverage and bias, supporting academic research and real-world efforts to improve media accountability.
- Asia > Middle East > Israel (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > Middle East > Iran (0.04)
- (19 more...)
- Media > News (1.00)
- Leisure & Entertainment > Sports (1.00)
- Law > Criminal Law (1.00)
- (6 more...)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations
Li, Ziwei, Duan, Yuhan, Xiong, Tianyu, Chen, Yi-Tang, Chao, Wei-Lun, Shen, Han-Wei
Effective surrogate models are critical for accelerating scientific simulations. Implicit neural representations (INRs) offer a compact and continuous framework for modeling spatially structured data, but they often struggle with complex scientific fields exhibiting localized, high-frequency variations. Recent approaches address this by introducing additional features along rigid geometric structures (e.g., grids), but at the cost of flexibility and increased model size. In this paper, we propose a simple yet effective alternative: Feature-Adaptive INR (FA-INR). FA-INR leverages cross-attention to an augmented memory bank to learn flexible feature representations, enabling adaptive allocation of model capacity based on data characteristics, rather than rigid structural assumptions. To further improve scalability, we introduce a coordinate-guided mixture of experts (MoE) that enhances the specialization and efficiency of feature representations. Experiments on three large-scale ensemble simulation datasets show that FA-INR achieves state-of-the-art fidelity while significantly reducing model size, establishing a new trade-off frontier between accuracy and compactness for INR-based surrogates.
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States > Texas > Montague County (0.04)
- North America > United States > Ohio (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Modeling & Simulation (0.96)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
f6876a9f998f6472cc26708e27444456-AuthorFeedback.pdf
We thank all reviewers for their thoughtful comments. "The method is only compared to prior models with long-term memory on the [QA] task, and doesn't perform as " This is expected as these are ML models with non-biological Our goal was to show that simple local Hebbian plasticity can be utilized to solve many of these tasks. "Is it essential that the key-value Our goal was to show that simple local plasticity is sufficient for many tasks. "How and why do the query and storage keys "[...] isn't it possible to achieve good performance on the tasks in the paper This approach is rather close to the approach of MemN2N. "[...] it would be helpful to explain the practical or physiological relevance in more detail.
Unable to Forget: Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length
Wang, Chupei, Sun, Jiaqiu Vince
Information retrieval in Large Language Models (LLMs) is increasingly recognized as intertwined with generation capabilities rather than mere lookup. While longer contexts are often assumed to improve retrieval, the effects of intra-context interference remain understudied. To address this, we adapt the proactive interference (PI) paradigm from cognitive science, where earlier information disrupts recall of newer updates. In humans, susceptibility to such interference is inversely linked to working memory capacity. We introduce PI-LLM, an evaluation that sequentially streams semantically related key-value updates and queries only the final values. Although these final values are clearly positioned just before the query, LLM retrieval accuracy declines log-linearly toward zero as interference accumulates; errors arise from retrieving previously overwritten values. Attempts to mitigate interference via prompt engineering (e.g., instructing models to ignore earlier input) yield limited success. These findings reveal a fundamental constraint on LLMs' ability to disentangle interference and flexibly manipulate information, suggesting a working memory bottleneck beyond mere context access. This calls for approaches that strengthen models' ability to suppress irrelevant content during retrieval.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)