AITopics

Learning diverse manipulation skills for real-world robots is severely bottlenecked by the reliance on costly and hard-to-scale teleoperated demonstrations. While human videos offer a scalable alternative, effectively transferring manipulation knowledge is fundamentally hindered by the significant morphological gap between human and robotic embodiments. To address this challenge and facilitate skill transfer from human to robot, we introduce Traj2Action,a novel framework that bridges this embodiment gap by using the 3D trajectory of the operational endpoint as a unified intermediate representation, and then transfers the manipulation knowledge embedded in this trajectory to the robot's actions. Our policy first learns to generate a coarse trajectory, which forms an high-level motion plan by leveraging both human and robot data. This plan then conditions the synthesis of precise, robot-specific actions (e.g., orientation and gripper state) within a co-denoising framework. Extensive real-world experiments on a Franka robot demonstrate that Traj2Action boosts the performance by up to 27% and 22.25% over $π_0$ baseline on short- and long-horizon real-world tasks, and achieves significant gains as human data scales in robot policy learning. Our project website, featuring code and video demonstrations, is available at https://anonymous.4open.science/w/Traj2Action-4A45/.

artificial intelligence, arxiv preprint arxiv, demonstration, (16 more...)

2510.00491

Genre: Research Report > New Finding (0.46)

Industry: Education (0.60)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.40)

Rehearsal-free and Task-free Online Continual Learning With Contrastive Prompt

Wang, Aopeng, Deng, Ke, Ren, Yongli, Luo, Jun

The main challenge of continual learning is \textit{catastrophic forgetting}. Because of processing data in one pass, online continual learning (OCL) is one of the most difficult continual learning scenarios. To address catastrophic forgetting in OCL, some existing studies use a rehearsal buffer to store samples and replay them in the later learning process, other studies do not store samples but assume a sequence of learning tasks so that the task identities can be explored. However, storing samples may raise data security or privacy concerns and it is not always possible to identify the boundaries between learning tasks in one pass of data processing. It motivates us to investigate rehearsal-free and task-free OCL (F2OCL). By integrating prompt learning with an NCM classifier, this study has effectively tackled catastrophic forgetting without storing samples and without usage of task boundaries or identities. The extensive experimental results on two benchmarks have demonstrated the effectiveness of the proposed method.

artificial intelligence, continual learning, machine learning, (15 more...)

2510.00467

Genre:

Research Report (1.00)
Instructional Material > Online (0.37)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting (0.94)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation

Su, Run, Fu, Hao, Zhou, Shuai, Fu, Yingao

Offline reinforcement learning (RL) has emerged as a promising framework for addressing robot social navigation challenges. However, inherent uncertainties in pedestrian behavior and limited environmental interaction during training often lead to suboptimal exploration and distributional shifts between offline training and online deployment. To overcome these limitations, this paper proposes a novel offline-to-online fine-tuning RL algorithm for robot social navigation by integrating Return-to-Go (RTG) prediction into a causal Transformer architecture. Our algorithm features a spatiotem-poral fusion model designed to precisely estimate RTG values in real-time by jointly encoding temporal pedestrian motion patterns and spatial crowd dynamics. This RTG prediction framework mitigates distribution shift by aligning offline policy training with online environmental interactions. Furthermore, a hybrid offline-online experience sampling mechanism is built to stabilize policy updates during fine-tuning, ensuring balanced integration of pre-trained knowledge and real-time adaptation. Extensive experiments in simulated social navigation environments demonstrate that our method achieves a higher success rate and lower collision rate compared to state-of-the-art baselines. These results underscore the efficacy of our algorithm in enhancing navigation policy robustness and adaptability. This work paves the way for more reliable and adaptive robotic navigation systems in real-world applications.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2510.00466

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Education (0.69)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

On-the-Fly Data Augmentation via Gradient-Guided and Sample-Aware Influence Estimation

Yang, Suorong, Zong, Jie, Wang, Lihang, Qin, Ziheng, Gan, Hai, Zhou, Pengfei, Wang, Kai, You, Yang, Shen, Furao

Data augmentation has been widely employed to improve the generalization of deep neural networks. Most existing methods apply fixed or random transformations. However, we find that sample difficulty evolves along with the model's generalization capabilities in dynamic training environments. As a result, applying uniform or stochastic augmentations, without accounting for such dynamics, can lead to a mismatch between augmented data and the model's evolving training needs, ultimately degrading training effectiveness. To address this, we introduce SADA, a Sample-Aware Dynamic Augmentation that performs on-the-fly adjustment of augmentation strengths based on each sample's evolving influence on model optimization. Specifically, we estimate each sample's influence by projecting its gradient onto the accumulated model update direction and computing the temporal variance within a local training window. Samples with low variance, indicating stable and consistent influence, are augmented more strongly to emphasize diversity, while unstable samples receive milder transformations to preserve semantic fidelity and stabilize learning. Our method is lightweight, which does not require auxiliary models or policy tuning. It can be seamlessly integrated into existing training pipelines as a plug-and-play module. Experiments across various benchmark datasets and model architectures show consistent improvements of SADA, including +7.3\% on fine-grained tasks and +4.3\% on long-tailed datasets, highlighting the method's effectiveness and practicality.

artificial intelligence, deep learning, machine learning, (16 more...)

2510.00434

Genre: Research Report (0.65)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs

Zhang, Kairun, Li, Haoyu, Zhao, Yanjun, Sun, Yifan, Zhang, Huan

Zeroth-order optimizers have recently emerged as a practical approach for fine-tuning large language models (LLMs), significantly reducing GPU memory consumption compared to traditional first-order methods. Yet, existing zeroth-order methods rely on hand-crafted, static sampling strategies that are not adaptable to model-specific structures. To address this, we propose ZO Fine-tuner, a learning-based zeroth-order optimizer for LLMs that automatically learns efficient perturbation strategies through a compact and memory-efficient design. Crucially, our approach is motivated by the observation that only a small number of foundation models and their derivatives are widely adopted in practice. Therefore, learning the optimizer once for a given LLM and reusing it across diverse downstream tasks is both feasible and highly desirable. Accordingly, ZO Fine-tuner is designed to scale learning to learn (L2L) to the foundation-model era by supporting one-time training per LLM with minimal overhead. Experiments on 4 LLMs and 7 datasets show that ZO Fine-tuner outperforms prior zeroth-order baselines in 82.1\% of task-model combinations, thereby demonstrating strong performance and scalability for efficient LLM fine-tuning. Our code is available at https://github.com/ASTRAL-Group/ZO_Fine_tuner.git.

large language model, machine learning, zo fine-tuner, (19 more...)

2510.00419

Country: North America > United States (0.46)

Genre: Research Report (0.51)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Graph2Region: Efficient Graph Similarity Learning with Structure and Scale Restoration

Liu, Zhouyang, Chen, Yixin, Liu, Ning, He, Jiezhong, Li, Dongsheng

Graph similarity is critical in graph-related tasks such as graph retrieval, where metrics like maximum common subgraph (MCS) and graph edit distance (GED) are commonly used. However, exact computations of these metrics are known to be NP-Hard. Recent neural network-based approaches approximate the similarity score in embedding spaces to alleviate the computational burden, but they either involve expensive pairwise node comparisons or fail to effectively utilize structural and scale information of graphs. To tackle these issues, we propose a novel geometric-based graph embedding method called Graph2Region (G2R). G2R represents nodes as closed regions and recovers their adjacency patterns within graphs in the embedding space. By incorporating the node features and adjacency patterns of graphs, G2R summarizes graph regions, i.e., graph embeddings, where the shape captures the underlying graph structures and the volume reflects the graph size. Consequently, the overlap between graph regions can serve as an approximation of MCS, signifying similar node regions and adjacency patterns. We further analyze the relationship between MCS and GED and propose using disjoint parts as a proxy for GED similarity. This analysis enables concurrent computation of MCS and GED, incorporating local and global structural information. Experimental evaluation highlights G2R's competitive performance in graph similarity computation. It achieves up to a 60.0\% relative accuracy improvement over state-of-the-art methods in MCS similarity learning, while maintaining efficiency in both training and inference. Moreover, G2R showcases remarkable capability in predicting both MCS and GED similarities simultaneously, providing a holistic assessment of graph similarity. Code available at https://github.com/liuzhouyang/Graph2Region.

artificial intelligence, machine learning, similarity, (16 more...)

2510.00394

Country: Asia > China (0.68)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-1-2025, 23:56:38 GMT

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization Chaobing Song Yong Jinag Yi Ma

This work was conducted during Chaobing Song's visit to Professor Yi Ma's group at UC Berkeley.

artificial intelligence, machine learning, vrada, (12 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.47)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsOct-1-2025, 23:43:03 GMT

Differentially private subspace clustering

Yining Wang, Yu-Xiang Wang, Aarti Singh

Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple "clusters" so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of differential privacy and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests.

algorithm, gibbs sampler, subspace, (13 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Montana (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)