Goto

Collaborating Authors

 Problem Solving


Knowledge Updating? No More Model Editing! Just Selective Contextual Reasoning

arXiv.org Artificial Intelligence

As real-world knowledge evolves, the information embedded within large language models (LLMs) can become outdated, inadequate, or erroneous. Model editing has emerged as a prominent approach for updating LLMs' knowledge with minimal computational costs and parameter changes. This approach typically identifies and adjusts specific model parameters associated with newly acquired knowledge. However, existing methods often underestimate the adverse effects that parameter modifications can have on broadly distributed knowledge. More critically, post-edit LLMs frequently struggle with multi-hop reasoning and continuous knowledge updates. Although various studies have discussed these shortcomings, there is a lack of comprehensive evaluation. In this paper, we provide an evaluation of ten model editing methods along four dimensions: reliability, generalization, locality, and portability. Results confirm that all ten popular model editing methods show significant shortcomings across multiple dimensions, suggesting model editing is less promising. We then propose a straightforward method called Selective Contextual Reasoning (SCR), for knowledge updating. SCR does not modify model parameters but harnesses LLM's inherent contextual reasoning capabilities utilizing the updated knowledge pieces. Under SCR, an LLM first assesses whether an incoming query falls within the scope of an external knowledge base. If it does, the relevant external knowledge texts are contextualized to enhance reasoning; otherwise, the query is answered directly. We evaluate SCR against the ten model editing methods on two counterfactual datasets with three backbone LLMs. Empirical results confirm the effectiveness and efficiency of contextual reasoning for knowledge updating.


Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

arXiv.org Artificial Intelligence

The rise of generative artificial intelligence (AI) as a novel frontier that uniquely merges advanced levels of intelligence with revolutionary user experiences is redefining the AI landscape for future cellular networks. In particular, the transition towards 6G systems has introduced a myriad of challenges inherent to their AI-native network design, requiring innovative solutions to enable real-time network orchestration, intelligent decision-making, and adaptive dynamic configurations. Meanwhile, the envisioned user experiences for 6G are growing increasingly complex, exceeding the capabilities offered by vintage wireless technologies and conventional AI solutions to satisfy their advanced demands. With its disruptive impact evident across diverse fields, generative AI possesses immense potential to tackle these challenges, leveraging its exceptional capabilities to manage complex tasks, operate autonomously, and adapt seamlessly to scenarios beyond its training domain. Remarkably, generative AI provides a transformative opportunity for telecom and cellular networks to bridge this defined gap in 6G systems, thereby shifting towards a new era with cutting-edge AI innovations across the different system and user levels.


An Empirical Study on Eliciting and Improving R1-like Reasoning Models

arXiv.org Artificial Intelligence

In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.


Learning Transformer-based World Models with Contrastive Predictive Coding

arXiv.org Artificial Intelligence

The DreamerV3 algorithm recently obtained remarkable performance across diverse environment domains by learning an accurate world model based on Recurrent Neural Networks (RNNs). Following the success of model-based reinforcement learning algorithms and the rapid adoption of the Transformer architecture for its superior training efficiency and favorable scaling properties, recent works such as STORM have proposed replacing RNN-based world models with Transformer-based world models using masked self-attention. However, despite the improved training efficiency of these methods, their impact on performance remains limited compared to the Dreamer algorithm, struggling to learn competitive Transformer-based world models. In this work, we show that the next state prediction objective adopted in previous approaches is insufficient to fully exploit the representation capabilities of Transformers. We propose to extend world model predictions to longer time horizons by introducing TWISTER (Transformer-based World model wIth contraSTivE Representations), a world model using actionconditioned Contrastive Predictive Coding to learn high-level temporal feature representations and improve the agent performance. TWISTER achieves a humannormalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER. TWISTER outperforms Following the success of neural networks in solving reinforcement other model-based approaches. TWM, learning problems, model-based approaches IRIS, STORM and -IRIS employ a learning world models using gradient backpropagation Transformer-based world model while were proposed to reduce the amount of necessary interaction DreamerV3 uses a RNN-based model. World models (Sutton, 1991; Ha & Schmidhuber, 2018) summarize an agent's experience into a predictive model that can be used in place of the real environment to learn complex behaviors. Having access to a model of the environment enables the agent to simulate multiple plausible trajectories in parallel, improving generalization, sample efficiency and decision-making via planning.


Knowledge Augmentation in Federation: Rethinking What Collaborative Learning Can Bring Back to Decentralized Data

arXiv.org Artificial Intelligence

Data, as an observable form of knowledge, has become one of the most important factors of production for the development of Artificial Intelligence (AI). Meanwhile, increasing legislation and regulations on private and proprietary information results in scattered data sources also known as the "data islands". Although some collaborative learning paradigms such as Federated Learning (FL) can enable privacy-preserving training over decentralized data, they have inherent deficiencies in fairness, costs and reproducibility because of being learning-centric, which greatly limits the way how participants cooperate with each other. In light of this, we present a knowledge-centric paradigm termed Knowledge Augmentation in Federation (KAF), with focus on how to enhance local knowledge through collaborative effort. We provide the suggested system architecture, formulate the prototypical optimization objective, and review emerging studies that employ methodologies suitable for KAF. On our roadmap, with a three-way categorization we describe the methods for knowledge expansion, knowledge filtering, and label and feature space correction in the federation. Further, we highlight several challenges and open questions that deserve more attention from the community. With our investigation, we intend to offer new insights for what collaborative learning can bring back to decentralized data.


Social Genome: Grounded Social Reasoning Abilities of Multimodal Models

arXiv.org Artificial Intelligence

Social reasoning abilities are crucial for AI systems to effectively interpret and respond to multimodal human communication and interaction within social contexts. We introduce Social Genome, the first benchmark for fine-grained, grounded social reasoning abilities of multimodal models. Social Genome contains 272 videos of interactions and 1,486 human-annotated reasoning traces related to inferences about these interactions. These traces contain 5,777 reasoning steps that reference evidence from visual cues, verbal cues, vocal cues, and external knowledge (contextual knowledge external to videos). Social Genome is also the first modeling challenge to study external knowledge in social reasoning. Social Genome computes metrics to holistically evaluate semantic and structural qualities of model-generated social reasoning traces. We demonstrate the utility of Social Genome through experiments with state-of-the-art models, identifying performance gaps and opportunities for future research to improve the grounded social reasoning abilities of multimodal models.


Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories

arXiv.org Machine Learning

By leveraging tools from the statistical mechanics of complex systems, in these short notes we extend the architecture of a neural network for hetero-associative memory (called three-directional associative memories, TAM) to explore supervised and unsupervised learning protocols. In particular, by providing entropic-heterogeneous datasets to its various layers, we predict and quantify a new emergent phenomenon -- that we term {\em layer's cooperativeness} -- where the interplay of dataset entropies across network's layers enhances their retrieval capabilities Beyond those they would have without reciprocal influence. Naively we would expect layers trained with less informative datasets to develop smaller retrieval regions compared to those pertaining to layers that experienced more information: this does not happen and all the retrieval regions settle to the same amplitude, allowing for optimal retrieval performance globally. This cooperative dynamics marks a significant advancement in understanding emergent computational capabilities within disordered systems.


A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

arXiv.org Artificial Intelligence

Recent theoretical results show transformers cannot express sequential reasoning problems over long input lengths, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to what degree bounded depth may suffice for solving problems over short inputs, or how increasing the transformer's depth affects its expressive power. We address these questions by analyzing the expressive power of transformers whose depth can grow minimally with context length $n$. We show even highly uniform transformers with depth $\Theta(\log n)$ can express two important problems: recognizing regular languages, which captures state tracking abilities, and graph connectivity, which underlies multi-step reasoning. Notably, both of these problems cannot be expressed by fixed-depth transformers under standard complexity conjectures, demonstrating the expressivity benefit of growing depth. Moreover, our theory quantitatively predicts how depth must grow with input length to express these problems, showing that depth scaling is more efficient than scaling width or chain-of-thought steps. Empirically, we find our theoretical depth requirements for regular language recognition match the practical depth requirements of transformers remarkably well. Thus, our results clarify precisely how depth affects transformers' reasoning capabilities, providing potential practical insights for designing models that are better at sequential reasoning.


SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

arXiv.org Artificial Intelligence

Researchers and practitioners in natural language processing and computational linguistics frequently observe and analyze the real language usage in large-scale corpora. For that purpose, they often employ off-the-shelf pattern-matching tools, such as grep, and keyword-in-context concordancers, which is widely used in corpus linguistics for gathering examples. Nonetheless, these existing techniques rely on surface-level string matching, and thus they suffer from the major limitation of not being able to handle orthographic variations and paraphrasing -- notable and common phenomena in any natural language. In addition, existing continuous approaches such as dense vector search tend to be overly coarse, often retrieving texts that are unrelated but share similar topics. Given these challenges, we propose a novel algorithm that achieves \emph{soft} (or semantic) yet efficient pattern matching by relaxing a surface-level matching with word embeddings. Our algorithm is highly scalable with respect to the size of the corpus text utilizing inverted indexes. We have prepared an efficient implementation, and we provide an accessible web tool. Our experiments demonstrate that the proposed method (i) can execute searches on billion-scale corpora in less than a second, which is comparable in speed to surface-level string matching and dense vector search; (ii) can extract harmful instances that semantically match queries from a large set of English and Japanese Wikipedia articles; and (iii) can be effectively applied to corpus-linguistic analyses of Latin, a language with highly diverse inflections.


L2R: Learning to Reduce Search Space for Generalizable Neural Routing Solver

arXiv.org Artificial Intelligence

Constructive neural combinatorial optimization (NCO) has attracted growing research attention due to its ability to solve complex routing problems without relying on handcrafted rules. However, existing NCO methods face significant challenges in generalizing to large-scale problems due to high computational complexity and inefficient capture of structural patterns. To address this issue, we propose a novel learning-based search space reduction method that adaptively selects a small set of promising candidate nodes at each step of the constructive NCO process. Unlike traditional methods that rely on fixed heuristics, our selection model dynamically prioritizes nodes based on learned patterns, significantly reducing the search space while maintaining solution quality. Experimental results demonstrate that our method, trained solely on 100-node instances from uniform distribution, generalizes remarkably well to large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) instances with up to 1 million nodes from the uniform distribution and over 80K nodes from other distributions.