xiao
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation.
Nonstationary Dual Averaging and Online Fair Allocation
We consider the problem of fairly allocating sequentially arriving items to a set of individuals. For this problem, the recently-introduced P ACE algorithm leverages the dual averaging algorithm to approximate competitive equilibria and thus generate online fair allocations. P ACE is simple, distributed, and parameter-free, making it appealing for practical use in large-scale systems. However, current performance guarantees for P ACE require i.i.d.
Learning from Hallucinating Critical Points for Navigation in Dynamic Environments
Ghani, Saad Abdul, Lee, Kameron, Xiao, Xuesu
Generating large and diverse obstacle datasets to learn motion planning in environments with dynamic obstacles is challenging due to the vast space of possible obstacle trajectories. Inspired by hallucination-based data synthesis approaches, we propose Learning from Hallucinating Critical Points (LfH-CP), a self-supervised framework for creating rich dynamic obstacle datasets based on existing optimal motion plans without requiring expensive expert demonstrations or trial-and-error exploration. LfH-CP factorizes hallucination into two stages: first identifying when and where obstacles must appear in order to result in an optimal motion plan, i.e., the critical points, and then procedurally generating diverse trajectories that pass through these points while avoiding collisions. This factorization avoids generative failures such as mode collapse and ensures coverage of diverse dynamic behaviors. We further introduce a diversity metric to quantify dataset richness and show that LfH-CP produces substantially more varied training data than existing baselines. Experiments in simulation demonstrate that planners trained on LfH-CP datasets achieves higher success rates compared to a prior hallucination method.
Nonstationary Dual Averaging and Online Fair Allocation
We consider the problem of fairly allocating sequentially arriving items to a set of individuals. For this problem, the recently-introduced P ACE algorithm leverages the dual averaging algorithm to approximate competitive equilibria and thus generate online fair allocations. P ACE is simple, distributed, and parameter-free, making it appealing for practical use in large-scale systems. However, current performance guarantees for P ACE require i.i.d.
GACL: Grounded Adaptive Curriculum Learning with Active Task and Performance Monitoring
Wang, Linji, Xu, Zifan, Stone, Peter, Xiao, Xuesu
-- Curriculum learning has emerged as a promising approach for training complex robotics tasks, yet current applications predominantly rely on manually designed curricula, which demand significant engineering effort and can suffer from subjective and suboptimal human design choices. While automated curriculum learning has shown success in simple domains like grid worlds and games where task distributions can be easily specified, robotics tasks present unique challenges: they require handling complex task spaces while maintaining relevance to target domain distributions that are only partially known through limited samples. We validate GACL on wheeled navigation in constrained environments and quadruped locomotion in challenging 3D confined spaces, achieving 6.8% and 6.1% higher success rates, respectively, than state-of-the-art methods in each domain. Curriculum learning has shown promises in training robots for complex tasks such as navigating through highly constrained environments or maintaining quadruped locomotion across challenging terrain [1], [2]. However, current applications of curriculum learning in robotics face a fundamental challenge: they predominantly rely on manually designed curricula, which demand significant engineering effort and can suffer from subjective, suboptimal design choices. For example, in quadruped locomotion tasks [2], roboticists must carefully design progressive stages from basic jumping skills to complex obstacle traversal and manually define success metrics and progression conditions at each stage.
Awesome-OL: An Extensible Toolkit for Online Learning
Liu, Zeyi, Hu, Songqiao, Han, Pengyu, Liu, Jiaming, He, Xiao
In recent years, online learning has attracted increasing attention due to its adaptive capability to process streaming and non-stationary data. To facilitate algorithm development and practical deployment in this area, we introduce Awesome-OL, an extensible Python toolkit tailored for online learning research. Awesome-OL integrates state-of-the-art algorithm, which provides a unified framework for reproducible comparisons, curated benchmark datasets, and multi-modal visualization. Built upon the scikit-multiflow open-source infrastructure, Awesome-OL emphasizes user-friendly interactions without compromising research flexibility or extensibility.
Narrate2Nav: Real-Time Visual Navigation with Implicit Language Reasoning in Human-Centric Environments
Payandeh, Amirreza, Pokhrel, Anuj, Song, Daeun, Zampieri, Marcos, Xiao, Xuesu
Large Vision-Language Models (VLMs) have demonstrated potential in enhancing mobile robot navigation in human-centric environments by understanding contextual cues, human intentions, and social dynamics while exhibiting reasoning capabilities. However, their computational complexity and limited sensitivity to continuous numerical data impede real-time performance and precise motion control. To this end, we propose Narrate2Nav, a novel real-time vision-action model that leverages a novel self-supervised learning framework based on the Barlow Twins redundancy reduction loss to embed implicit natural language reasoning, social cues, and human intentions within a visual encoder-enabling reasoning in the model's latent space rather than token space. The model combines RGB inputs, motion commands, and textual signals of scene context during training to bridge from robot observations to low-level motion commands for short-horizon point-goal navigation during deployment. Extensive evaluation of Narrate2Nav across various challenging scenarios in both offline unseen dataset and real-world experiments demonstrates an overall improvement of 52.94 percent and 41.67 percent, respectively, over the next best baseline. Additionally, qualitative comparative analysis of Narrate2Nav's visual encoder attention map against four other baselines demonstrates enhanced attention to navigation-critical scene elements, underscoring its effectiveness in human-centric navigation tasks.
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation.
AnalyticKWS: Towards Exemplar-Free Analytic Class Incremental Learning for Small-footprint Keyword Spotting
Xiao, Yang, Peng, Tianyi, Das, Rohan Kumar, Hu, Yuchen, Zhuang, Huiping
Keyword spotting (KWS) offers a vital mechanism to identify spoken commands in voice-enabled systems, where user demands often shift, requiring models to learn new keywords continually over time. However, a major problem is catastrophic forgetting, where models lose their ability to recognize earlier keywords. Although several continual learning methods have proven their usefulness for reducing forgetting, most existing approaches depend on storing and revisiting old data to combat catastrophic forgetting. Though effective, these methods face two practical challenges: 1) privacy risks from keeping user data and 2) large memory and time consumption that limit deployment on small devices. To address these issues, we propose an exemplar-free Analytic Continual Learning (AnalyticKWS) method that updates model parameters without revisiting earlier data. Inspired by efficient learning principles, AnalyticKWS computes a closed-form analytical solution for model updates and requires only a single epoch of adaptation for incoming keywords. AnalyticKWS demands fewer computational resources by avoiding gradient-based updates and does not store old data. By eliminating the need for back-propagation during incremental learning, the model remains lightweight and efficient. As a result, AnalyticKWS meets the challenges mentioned earlier and suits resource-limited settings well. Extensive experiments on various datasets and settings show that AnalyticKWS consistently outperforms existing continual learning methods.
- Information Technology > Security & Privacy (1.00)
- Education (0.93)
'You Can't Lick a Badger Twice': Google Failures Highlight a Fundamental AI Flaw
Here's a nice little distraction from your workday: Head to Google, type in any made-up phrase, add the word "meaning," and search. Google's AI Overviews will not only confirm that your gibberish is a real saying, it will also tell you what it means and how it was derived. This is genuinely fun, and you can find lots of examples on social media. In the world of AI Overviews, "a loose dog won't surf" is "a playful way of saying that something is not likely to happen or that something is not going to work out." The invented phrase "wired is as wired does" is an idiom that means "someone's behavior or characteristics are a direct result of their inherent nature or'wiring,' much like a computer's function is determined by its physical connections."