AITopics | manipulation policy

Collaborating Authors

manipulation policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distracted Robot: How Visual Clutter Undermine Robotic Manipulation

Rasouli, Amir, Alban, Montgomery, Pakdamansavoji, Sajjad, Li, Zhiyuan, Zhang, Zhanguang, Wu, Aaron, Zhao, Xuan

arXiv.org Artificial IntelligenceDec-1-2025

In this work, we propose an evaluation protocol for examining the performance of robotic manipulation policies in cluttered scenes. Contrary to prior works, we approach evaluation from a psychophysical perspective, therefore we use a unified measure of clutter that accounts for environmental factors as well as the distractors quantity, characteristics, and arrangement. Using this measure, we systematically construct evaluation scenarios in both hyper-realistic simulation and real-world and conduct extensive experimentation on manipulation policies, in particular vision-language-action (VLA) models. Our experiments highlight the significant impact of scene clutter, lowering the performance of the policies, by as much as 34% and show that despite achieving similar average performance across the tasks, different VLA policies have unique vulnerabilities and a relatively low agreement on success scenarios. We further show that our clutter measure is an effective indicator of performance degradation and analyze the impact of distractors in terms of their quantity and occluding influence. At the end, we show that finetuning on enhanced data, although effective, does not equally remedy all negative impacts of clutter on performance.

artificial intelligence, distractor, scenario, (15 more...)

arXiv.org Artificial Intelligence

2511.2278

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Improving Robotic Manipulation Robustness via NICE Scene Surgery

Pakdamansavoji, Sajjad, Pourkeshavarz, Mozhgan, Sigal, Adam, Li, Zhiyuan, Yang, Rui Heng, Rasouli, Amir

arXiv.org Artificial IntelligenceDec-1-2025

Learning robust visuomotor policies for robotic manipulation remains a challenge in real-world settings, where visual distractors can significantly degrade performance and safety. In this work, we propose an effective and scalable framework, Naturalistic Inpainting for Context Enhancement (NICE). Our method minimizes out-of-distribution (OOD) gap in imitation learning by increasing visual diversity through construction of new experiences using existing demonstrations. By utilizing image generative frameworks and large language models, NICE performs three editing operations, object replacement, restyling, and removal of distracting (non-target) objects. These changes preserve spatial relationships without obstructing target objects and maintain action-label consistency. Unlike previous approaches, NICE requires no additional robot data collection, simulator access, or custom model training, making it readily applicable to existing robotic datasets. Using real-world scenes, we showcase the capability of our framework in producing photo-realistic scene enhancement. For downstream tasks, we use NICE data to finetune a vision-language model (VLM) for spatial affordance prediction and a vision-language-action (VLA) policy for object manipulation. Our evaluations show that NICE successfully minimizes OOD gaps, resulting in over 20% improvement in accuracy for affordance prediction in highly cluttered scenes. For manipulation tasks, success rate increases on average by 11% when testing in environments populated with distractors in different quantities. Furthermore, we show that our method improves visual robustness, lowering target confusion by 6%, and enhances safety by reducing collision rate by 7%.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.22777

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills

Zhou, Jiayu, Wu, Qiwei, Li, Jian, Chen, Zhe, Xiong, Xiaogang, Xu, Renjing

arXiv.org Artificial IntelligenceNov-11-2025

Autonomous execution of long-horizon, contact-rich manipulation tasks traditionally requires extensive real-world data and expert engineering, posing significant cost and scalability challenges. This paper proposes a novel framework integrating hierarchical semantic decomposition, reinforcement learning (RL), visual language models (VLMs), and knowledge distillation to overcome these limitations. Complex tasks are decomposed into atomic skills, with RL-trained policies for each primitive exclusively in simulation. Crucially, our RL formulation incorporates explicit force constraints to prevent object damage during delicate interactions. VLMs perform high-level task decomposition and skill planning, generating diverse expert demonstrations. These are distilled into a unified policy via Visual-Tactile Diffusion Policy for end-to-end execution. We conduct comprehensive ablation studies exploring different VLM-based task planners to identify optimal demonstration generation pipelines, and systematically compare imitation learning algorithms for skill distillation. Extensive simulation experiments and physical deployment validate that our approach achieves policy learning for long-horizon manipulation without costly human demonstrations, while the VLM-guided atomic skill framework enables scalable generalization to diverse tasks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2511.05855

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation

Jiang, Guangqi, Chang, Haoran, Qiu, Ri-Zhao, Liang, Yutong, Ji, Mazeyu, Zhu, Jiyue, Dong, Zhao, Zou, Xueyan, Wang, Xiaolong

arXiv.org Artificial IntelligenceOct-24-2025

This paper presents GSWorld, a robust, photo-realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. Our framework advocates "closing the loop" of developing manipulation policies with reproducible evaluation of policies learned from real-robot data and sim2real policy training without using real robots. To enable photo-realistic rendering of diverse scenes, we propose a new asset format, which we term GSDF (Gaussian Scene Description File), that infuses Gaussian-on-Mesh representation with robot URDF and other objects. With a streamlined reconstruction pipeline, we curate a database of GSDF that contains 3 robot embodiments for single-arm and bimanual manipulation, as well as more than 40 objects. Combining GSDF with physics engines, we demonstrate several immediate interesting applications: (1) learning zero-shot sim2real pixel-to-action manipulation policy with photo-realistic rendering, (2) automated high-quality DAgger data collection for adapting policies to deployment environments, (3) reproducible benchmarking of real-robot manipulation policies in simulation, (4) simulation data collection by virtual teleoperation, and (5) zero-shot sim2real visual reinforcement learning. Website: https://3dgsworld.github.io/.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.20813

Genre: Research Report (0.82)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.34)

Add feedback

Multi-Modal Decentralized Reinforcement Learning for Modular Reconfigurable Lunar Robots

Mishra, Ashutosh, Santra, Shreya, Neppel, Elian, Lombardi, Edoardo M. Rossi, Karimov, Shamistan, Uno, Kentaro, Yoshida, Kazuya

arXiv.org Artificial IntelligenceOct-24-2025

Modular reconfigurable robots suit task-specific space operations, but the combinatorial growth of morphologies hinders unified control. We propose a decentralized reinforcement learning (Dec-RL) scheme where each module learns its own policy: wheel modules use Soft Actor-Critic (SAC) for locomotion and 7-DoF limbs use Proximal Policy Optimization (PPO) for steering and manipulation, enabling zero-shot generalization to unseen configurations. In simulation, the steering policy achieved a mean absolute error of 3.63° between desired and induced angles; the manipulation policy plateaued at 84.6 % success on a target-offset criterion; and the wheel policy cut average motor torque by 95.4 % relative to baseline while maintaining 99.6 % success. Lunar-analogue field tests validated zero-shot integration for autonomous locomotion, steering, and preliminary alignment for reconfiguration. The system transitioned smoothly among synchronous, parallel, and sequential modes for Policy Execution, without idle states or control conflicts, indicating a scalable, reusable, and robust approach for modular lunar robots.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.20347

Country:

Europe (0.28)
Asia > Japan > Honshū > Tōhoku (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation

Fu, Yuhui, Xie, Feiyang, Xu, Chaoyi, Xiong, Jing, Yuan, Haoqi, Lu, Zongqing

arXiv.org Artificial IntelligenceOct-14-2025

Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.

artificial intelligence, arxiv preprint arxiv, whole-body controller, (14 more...)

arXiv.org Artificial Intelligence

2510.11258

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.34)
Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies

Steiner, Remo, Millane, Alexander, Tingdahl, David, Volk, Clemens, Ramasamy, Vikram, Yao, Xinjie, Du, Peter, Pouya, Soha, Sheng, Shiwei

arXiv.org Artificial IntelligenceOct-8-2025

End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.

artificial intelligence, machine learning, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

2509.20297

Country: Europe > Switzerland (0.28)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Viewpoint-Agnostic Manipulation Policies with Strategic Vantage Selection

Vasudevan, Sreevishakh, Sagar, Som, Senanayake, Ransalu

arXiv.org Artificial IntelligenceOct-7-2025

Abstract-- Since vision-based manipulation policies are typically trained from data gathered from a single viewpoint, their performance drops when the view changes during deployment. Naively aggregating demonstrations from numerous random views is not only costly but also known to destabilize learning, as excessive visual diversity acts as noise. We present V antage, a viewpoint selection framework to fine-tune any pre-trained policy on a small, strategically chosen set of camera poses to induce viewpoint-agnostic behavior . Instead of relying on costly brute-force search over viewpoints, V antage formulates camera placement as an information gain optimization problem in a continuous space. This approach balances exploration of novel poses with exploitation of promising ones, while also providing theoretical guarantees about convergence and robustness. Across manipulation tasks and policy families, V antage consistently improves success under viewpoint shifts compared to fixed, grid, or random data selection strategies with only a handful of fine-tuning steps. Experiments conducted on simulated and real-world setups show that V antage increases the task success rate by 25% for diffusion policies, and yields robust gains in dynamic-camera settings. I. INTRODUCTION Modern robot manipulation policies trained with visual inputs have achieved levels of precision and adaptability that were once considered far-fetched.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2506.12261

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning

Kim, Sunghwan, Chung, Woojeh, Dai, Zhirui, Bhatt, Dwait, Shukla, Arth, Su, Hao, Tian, Yulun, Atanasov, Nikolay

arXiv.org Artificial IntelligenceOct-7-2025

In this paper, we demonstrate that mobile manipulation policies utilizing a 3D latent map achieve stronger spatial and temporal reasoning than policies relying solely on images. We introduce Seeing the Bigger Picture (SBP), an end-to-end policy learning approach that operates directly on a 3D map of latent features. In SBP, the map extends perception beyond the robot's current field of view and aggregates observations over long horizons. Our mapping approach incrementally fuses multiview observations into a grid of scene-specific latent features. A pre-trained, scene-agnostic decoder reconstructs target embeddings from these features and enables online optimization of the map features during task execution. A policy, trainable with behavior cloning or reinforcement learning, treats the latent map as a state variable and uses global context from the map obtained via a 3D feature aggregator. We evaluate SBP on scene-level mobile manipulation and sequential tabletop manipulation tasks. Our experiments demonstrate that SBP (i) reasons globally over the scene, (ii) leverages the map as long-horizon memory, and (iii) outperforms image-based policies in both in-distribution and novel scenes, e.g., improving the success rate by 25% for the sequential manipulation task.

artificial intelligence, machine learning, manipulation, (17 more...)

arXiv.org Artificial Intelligence

2510.03885

Country:

North America > United States > California (0.28)
North America > United States > Michigan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Mobi-$π$: Mobilizing Your Robot Learning Policy

Yang, Jingyun, Huang, Isabella, Vu, Brandon, Bajracharya, Max, Antonova, Rika, Bohg, Jeannette

arXiv.org Artificial IntelligenceSep-29-2025

Learned visuomotor policies are capable of performing increasingly complex manipulation tasks. However, most of these policies are trained on data collected from limited robot positions and camera viewpoints. This leads to poor generalization to novel robot positions, which limits the use of these policies on mobile platforms, especially for precise tasks like pressing buttons or turning faucets. In this work, we formulate the policy mobilization problem: find a mobile robot base pose in a novel environment that is in distribution with respect to a manipulation policy trained on a limited set of camera viewpoints. Compared to retraining the policy itself to be more robust to unseen robot base pose initializations, policy mobilization decouples navigation from manipulation and thus does not require additional demonstrations. Crucially, this problem formulation complements existing efforts to improve manipulation policy robustness to novel viewpoints and remains compatible with them. We propose a novel approach for policy mobilization that bridges navigation and manipulation by optimizing the robot's base pose to align with an in-distribution base pose for a learned policy. Our approach utilizes 3D Gaussian Splatting for novel view synthesis, a score function to evaluate pose suitability, and sampling-based optimization to identify optimal robot poses. To understand policy mobilization in more depth, we also introduce the Mobi-$π$ framework, which includes: (1) metrics that quantify the difficulty of mobilizing a given policy, (2) a suite of simulated mobile manipulation tasks based on RoboCasa to evaluate policy mobilization, and (3) visualization tools for analysis. In both our developed simulation task suite and the real world, we show that our approach outperforms baselines, demonstrating its effectiveness for policy mobilization.

artificial intelligence, manipulation policy, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2505.23692

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback