Goto

Collaborating Authors

 xiao




Tech giant Meta buys Chinese-founded AI firm Manus

Al Jazeera

Tech giant Meta has announced it will buy artificial intelligence startup Manus in a rare crossover of US and Chinese technology amid Washington and Beijing's heated tech rivalry. Meta said the acquisition would see it take over the operation of Manus's self-directing AI agent and integrate the technology into its own products. Meta, the parent company of Facebook and Instagram, said the deal would bring one of the "leading autonomous general-purpose agents" to billions of people worldwide. "Manus's exceptional talent will join Meta's team to deliver general-purpose agents across our consumer and business products, including in Meta AI," the California-based firm said in a statement on Monday. "We're excited to welcome the Manus team and help improve the lives of billions of people and millions of businesses with their technology."


InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Neural Information Processing Systems

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation.


Learning from Hallucinating Critical Points for Navigation in Dynamic Environments

Ghani, Saad Abdul, Lee, Kameron, Xiao, Xuesu

arXiv.org Artificial Intelligence

Generating large and diverse obstacle datasets to learn motion planning in environments with dynamic obstacles is challenging due to the vast space of possible obstacle trajectories. Inspired by hallucination-based data synthesis approaches, we propose Learning from Hallucinating Critical Points (LfH-CP), a self-supervised framework for creating rich dynamic obstacle datasets based on existing optimal motion plans without requiring expensive expert demonstrations or trial-and-error exploration. LfH-CP factorizes hallucination into two stages: first identifying when and where obstacles must appear in order to result in an optimal motion plan, i.e., the critical points, and then procedurally generating diverse trajectories that pass through these points while avoiding collisions. This factorization avoids generative failures such as mode collapse and ensures coverage of diverse dynamic behaviors. We further introduce a diversity metric to quantify dataset richness and show that LfH-CP produces substantially more varied training data than existing baselines. Experiments in simulation demonstrate that planners trained on LfH-CP datasets achieves higher success rates compared to a prior hallucination method.


Nonstationary Dual Averaging and Online Fair Allocation

Neural Information Processing Systems

We consider the problem of fairly allocating sequentially arriving items to a set of individuals. For this problem, the recently-introduced P ACE algorithm leverages the dual averaging algorithm to approximate competitive equilibria and thus generate online fair allocations. P ACE is simple, distributed, and parameter-free, making it appealing for practical use in large-scale systems. However, current performance guarantees for P ACE require i.i.d.


GACL: Grounded Adaptive Curriculum Learning with Active Task and Performance Monitoring

Wang, Linji, Xu, Zifan, Stone, Peter, Xiao, Xuesu

arXiv.org Artificial Intelligence

-- Curriculum learning has emerged as a promising approach for training complex robotics tasks, yet current applications predominantly rely on manually designed curricula, which demand significant engineering effort and can suffer from subjective and suboptimal human design choices. While automated curriculum learning has shown success in simple domains like grid worlds and games where task distributions can be easily specified, robotics tasks present unique challenges: they require handling complex task spaces while maintaining relevance to target domain distributions that are only partially known through limited samples. We validate GACL on wheeled navigation in constrained environments and quadruped locomotion in challenging 3D confined spaces, achieving 6.8% and 6.1% higher success rates, respectively, than state-of-the-art methods in each domain. Curriculum learning has shown promises in training robots for complex tasks such as navigating through highly constrained environments or maintaining quadruped locomotion across challenging terrain [1], [2]. However, current applications of curriculum learning in robotics face a fundamental challenge: they predominantly rely on manually designed curricula, which demand significant engineering effort and can suffer from subjective, suboptimal design choices. For example, in quadruped locomotion tasks [2], roboticists must carefully design progressive stages from basic jumping skills to complex obstacle traversal and manually define success metrics and progression conditions at each stage.


Awesome-OL: An Extensible Toolkit for Online Learning

Liu, Zeyi, Hu, Songqiao, Han, Pengyu, Liu, Jiaming, He, Xiao

arXiv.org Artificial Intelligence

In recent years, online learning has attracted increasing attention due to its adaptive capability to process streaming and non-stationary data. To facilitate algorithm development and practical deployment in this area, we introduce Awesome-OL, an extensible Python toolkit tailored for online learning research. Awesome-OL integrates state-of-the-art algorithm, which provides a unified framework for reproducible comparisons, curated benchmark datasets, and multi-modal visualization. Built upon the scikit-multiflow open-source infrastructure, Awesome-OL emphasizes user-friendly interactions without compromising research flexibility or extensibility.


Narrate2Nav: Real-Time Visual Navigation with Implicit Language Reasoning in Human-Centric Environments

Payandeh, Amirreza, Pokhrel, Anuj, Song, Daeun, Zampieri, Marcos, Xiao, Xuesu

arXiv.org Artificial Intelligence

Large Vision-Language Models (VLMs) have demonstrated potential in enhancing mobile robot navigation in human-centric environments by understanding contextual cues, human intentions, and social dynamics while exhibiting reasoning capabilities. However, their computational complexity and limited sensitivity to continuous numerical data impede real-time performance and precise motion control. To this end, we propose Narrate2Nav, a novel real-time vision-action model that leverages a novel self-supervised learning framework based on the Barlow Twins redundancy reduction loss to embed implicit natural language reasoning, social cues, and human intentions within a visual encoder-enabling reasoning in the model's latent space rather than token space. The model combines RGB inputs, motion commands, and textual signals of scene context during training to bridge from robot observations to low-level motion commands for short-horizon point-goal navigation during deployment. Extensive evaluation of Narrate2Nav across various challenging scenarios in both offline unseen dataset and real-world experiments demonstrates an overall improvement of 52.94 percent and 41.67 percent, respectively, over the next best baseline. Additionally, qualitative comparative analysis of Narrate2Nav's visual encoder attention map against four other baselines demonstrates enhanced attention to navigation-critical scene elements, underscoring its effectiveness in human-centric navigation tasks.


InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Neural Information Processing Systems

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation.