non-prehensile manipulation
AdaptPNP: Integrating Prehensile and Non-Prehensile Skills for Adaptive Robotic Manipulation
Zhu, Jinxuan, Tie, Chenrui, Cao, Xinyi, Wang, Yuran, Guo, Jingxiang, Chen, Zixuan, Chen, Haonan, Chen, Junting, Xiao, Yangyu, Wu, Ruihai, Shao, Lin
Abstract-- Non-prehensile (NP) manipulation, in which robots alter object states without forming stable grasps (for example, pushing, poking, or sliding), significantly broadens robotic manipulation capabilities when grasping is infeasible or insufficient. However, enabling a unified framework that generalizes across different tasks, objects, and environments while seamlessly integrating non-prehensile and prehensile (P) actions remains challenging: robots must determine when to invoke NP skills, select the appropriate primitive for each context, and compose P and NP strategies into robust, multi-step plans. We introduce AdaptPNP, a vision-language model (VLM)-empowered task and motion planning framework that systematically selects and combines P and NP skills to accomplish diverse manipulation objectives. Our approach leverages a VLM to interpret visual scene observations and textual task descriptions, generating a high-level plan skeleton that prescribes the sequence and coordination of P and NP actions. A digital-twin based object-centric intermediate layer predicts desired object poses, enabling proactive mental rehearsal of manipulation sequences. We evaluate AdaptPNP across representative P&NP hybrid manipulation tasks in both simulation and real-world environments. These results underscore the potential of hybrid P&NP manipulation as a crucial step toward general-purpose, human-level robotic manipulation capabilities. When manipulating objects to achieve desired configurations, robots typically rely on establishing stable grasps and transporting objects to target locations.
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.89)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
BiNoMaP: Learning Category-Level Bimanual Non-Prehensile Manipulation Primitives
Non-prehensile manipulation, encompassing ungraspable actions such as pushing, poking, and pivoting, represents a critical yet underexplored domain in robotics due to its contact-rich and analytically intractable nature. In this work, we revisit this problem from two novel perspectives. First, we move beyond the usual single-arm setup and the strong assumption of favorable external dexterity such as walls, ramps, or edges. Instead, we advocate a generalizable dual-arm configuration and establish a suite of Bimanual Non-prehensile Manipulation Primitives (BiNoMaP). Second, we depart from the prevailing RL-based paradigm and propose a three-stage, RL-free framework to learn non-prehensile skills. Specifically, we begin by extracting bimanual hand motion trajectories from video demonstrations. Due to visual inaccuracies and morphological gaps, these coarse trajectories are difficult to transfer directly to robotic end-effectors. To address this, we propose a geometry-aware post-optimization algorithm that refines raw motions into executable manipulation primitives that conform to specific motion patterns. Beyond instance-level reproduction, we further enable category-level generalization by parameterizing the learned primitives with object-relevant geometric attributes, particularly size, resulting in adaptable and general parameterized manipulation primitives. We validate BiNoMaP across a range of representative bimanual tasks and diverse object categories, demonstrating its effectiveness, efficiency, versatility, and superior generalization capability. Non-prehensile manipulation refers to a class of robotic actions that do not rely on firm grasping but instead leverage physical interactions such as poking, or pivoting, or pushing to achieve manipulation goals Zhou et al. (2019); Hogan & Rodriguez (2020); Sun et al. (2020); Zhou & Held (2023); Zhang et al. (2023). These skills are not merely complementary to traditional grasp-based tasks; they are often essential in scenarios where grasping is physically infeasible or inefficient. In dual-arm robotic systems Liu et al. (2022); Wu & Kruse (2024); Y amada et al. (2025); Lu et al. (2025), non-prehensile manipulation becomes especially relevant when dealing with objects that are too fragile, too flat, or lack sufficient geometry for reliable grasping. Despite its importance, current non-prehensile manipulation faces two core bottlenecks.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Lyu, Jiangran, Li, Ziming, Shi, Xuesong, Xu, Chaoyi, Wang, Yizhou, Wang, He
Nonprehensile manipulation is crucial for handling objects that are too thin, large, or otherwise ungraspable in unstructured environments. While conventional planning-based approaches struggle with complex contact modeling, learning-based methods have recently emerged as a promising alternative. However, existing learning-based approaches face two major limitations: they heavily rely on multi-view cameras and precise pose tracking, and they fail to generalize across varying physical conditions, such as changes in object mass and table friction. To address these challenges, we propose the Dynamics-Adaptive World Action Model (DyWA), a novel framework that enhances action learning by jointly predicting future states while adapting to dynamics variations based on historical trajectories. By unifying the modeling of geometry, state, physics, and robot actions, DyWA enables more robust policy learning under partial observability. Compared to baselines, our method improves the success rate by 31.5% using only single-view point cloud observations in the simulation. Furthermore, DyWA achieves an average success rate of 68% in real-world experiments, demonstrating its ability to generalize across diverse object geometries, adapt to varying table friction, and robustness in challenging scenarios such as half-filled water bottles and slippery surfaces.
- North America > United States (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding
Raei, Hamidreza, De Momi, Elena, Ajoudani, Arash
Although robotic applications increasingly demand versatile and dynamic object handling, most existing techniques are predominantly focused on grasp-based manipulation, limiting their applicability in non-prehensile tasks. To address this need, this study introduces a Deep Deterministic Policy Gradient (DDPG) reinforcement learning framework for efficient non-prehensile manipulation, specifically for sliding an object on a surface. The algorithm generates a linear trajectory by precisely controlling the acceleration of a robotic arm rigidly coupled to the horizontal surface, enabling the relative manipulation of an object as it slides on top of the surface. Furthermore, two distinct algorithms have been developed to estimate the frictional forces dynamically during the sliding process. These algorithms provide online friction estimates after each action, which are fed back into the actor model as critical feedback after each action. This feedback mechanism enhances the policy's adaptability and robustness, ensuring more precise control of the platform's acceleration in response to varying surface condition. The proposed algorithm is validated through simulations and real-world experiments. Results demonstrate that the proposed framework effectively generalizes sliding manipulation across varying distances and, more importantly, adapts to different surfaces with diverse frictional properties. Notably, the trained model exhibits zero-shot sim-to-real transfer capabilities.
- North America > United States (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > Portugal (0.04)
- (3 more...)
Learning Long-Horizon Robot Manipulation Skills via Privileged Action
Mao, Xiaofeng, Xu, Yucheng, Sun, Zhaole, Miller, Elle, Layeghi, Daniel, Mistry, Michael
Long-horizon contact-rich tasks are challenging to learn with reinforcement learning, due to ineffective exploration of high-dimensional state spaces with sparse rewards. The learning process often gets stuck in local optimum and demands task-specific reward fine-tuning for complex scenarios. In this work, we propose a structured framework that leverages privileged actions with curriculum learning, enabling the policy to efficiently acquire long-horizon skills without relying on extensive reward engineering or reference trajectories. Specifically, we use privileged actions in simulation with a general training procedure that would be infeasible to implement in real-world scenarios. These privileges include relaxed constraints and virtual forces that enhance interaction and exploration with objects. Our results successfully achieve complex multi-stage long-horizon tasks that naturally combine non-prehensile manipulation with grasping to lift objects from non-graspable poses. We demonstrate generality by maintaining a parsimonious reward structure and showing convergence to diverse and robust behaviors across various environments. Additionally, real-world experiments further confirm that the skills acquired using our approach are transferable to real-world environments, exhibiting robust and intricate performance. Our approach outperforms state-of-the-art methods in these tasks, converging to solutions where others fail.
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.34)
A Machine Learning Approach to Sensor Substitution for Non-Prehensile Manipulation
Ozdamar, Idil, Sirintuna, Doganay, Ajoudani, Arash
Mobile manipulators are increasingly deployed in complex environments, requiring diverse sensors to perceive and interact with their surroundings. However, equipping every robot with every possible sensor is often impractical due to cost and physical constraints. A critical challenge arises when robots with differing sensor capabilities need to collaborate or perform similar tasks. For example, consider a scenario where a mobile manipulator equipped with high-resolution tactile skin is skilled at non-prehensile manipulation tasks like pushing. If this robot needs to be replaced or augmented by a robot lacking such tactile sensing, the learned manipulation policies become inapplicable. This paper addresses the problem of sensor substitution in non-prehensile manipulation. We propose a novel machine learning-based framework that enables a robot with a limited sensor set (e.g., LiDAR or RGB-D camera) to effectively perform tasks previously reliant on a richer sensor suite (e.g., tactile skin). Our approach learns a mapping between the available sensor data and the information provided by the substituted sensor, effectively synthesizing the missing sensory input. Specifically, we demonstrate the efficacy of our framework by training a model to substitute tactile skin data for the task of non-prehensile pushing using a mobile manipulator. We show that a manipulator equipped only with LiDAR or RGB-D can, after training, achieve comparable and sometimes even better pushing performance to a mobile base utilizing direct tactile feedback.
Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions
Ferrandis, Juan Del Aguila, Moura, João, Vijayakumar, Sethu
Non-prehensile manipulation is a crucial skill for enabling versatile robots to interact with ungraspable objects, using actions such as pushing, rolling, or tossing. However, achieving dexterous non-prehensile manipulation in robots poses significant challenges. During contact interactions, different contact modes arise such as sticking, sliding, and separation, and transitions between these contact modes lead to hybrid dynamics [1, 2, 3]. Furthermore, due to its underactuated nature, it requires long-term reasoning about contact interactions as well as reactive control to recover from mistakes and disturbances [1, 2]. The frictional interactions between the robot, the object, and the environment are difficult to model, which creates uncertainty in the behavior of the object [4, 5]. The highly uncertain nature of the underactuated frictional interactions [4, 5] make the nonprehensile manipulation problem especially sensitive to occlusions. Previous non-prehensile works assume near-perfect visual perception from external systems, providing either point-cloud [6] or pose observations [7, 8, 9, 10, 11]. However, moving towards more versatile onboard perception will make frequent occlusions unavoidable, either due to obstacles in the environment, self occlusions, or even human-induced occlusions, for instance in a human-robot collaboration setting. In this paper, we propose a learning-based system for non-prehensile manipulation that leverages tactile sensing to overcome occlusions in the visual perception.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation
Le, Huy, Gabriel, Miroslav, Hoang, Tai, Neumann, Gerhard, Vien, Ngo Anh
Learning diverse policies for non-prehensile manipulation is essential for improving skill transfer and generalization to out-of-distribution scenarios. In this work, we enhance exploration through a two-fold approach within a hybrid framework that tackles both discrete and continuous action spaces. First, we model the continuous motion parameter policy as a diffusion model, and second, we incorporate this into a maximum entropy reinforcement learning framework that unifies both the discrete and continuous components. The discrete action space, such as contact point selection, is optimized through Q-value function maximization, while the continuous part is guided by a diffusion-based policy. This hybrid approach leads to a principled objective, where the maximum entropy term is derived as a lower bound using structured variational inference. We propose the Hybrid Diffusion Policy algorithm (HyDo) and evaluate its performance on both simulation and zero-shot sim2real tasks. Our results show that HyDo encourages more diverse behavior policies, leading to significantly improved success rates across tasks - for example, increasing from 53% to 72% on a real-world 6D pose alignment task. Project page: https://leh2rng.github.io/hydo
Novel Non-Prehensile Rolling Problem: Modelling and Balance Control of Pendulum-Driven Reconfigurable Disks Motion with Magnetic Coupling in Simulation
Wiltshire, Ollie, Tafrishi, Seyed Amir
This paper presents a novel type of mobile rolling robot designed as a modular platform for non-prehensile manipulation, highlighting the associated control challenges in achieving balancing control of the robotic system. The developed rolling disk modules incorporate an innovative internally actuated magnetic-pendulum coupling mechanism, which introduces a compelling control problem due to the frictional and sliding interactions, as well as the magnetic effects between each module. In this paper, we derive the nonlinear dynamics of the robot using the Euler-Lagrange formulation. Then, through simulation, the motion behavior of the system is studied and analyzed, providing critical insights for future investigations into control methods for complex non-prehensile motion between robotic modules. Also, we study the balancing of this new platform and introduce a new motion pattern of lifting. This research aims to enhance the understanding and implementation of modular self-reconfigurable robots in various scenarios for future applications.
Pre- and post-contact policy decomposition for non-prehensile manipulation with zero-shot sim-to-real transfer
Kim, Minchan, Han, Junhyek, Kim, Jaehyung, Kim, Beomjoon
We present a system for non-prehensile manipulation that require a significant number of contact mode transitions and the use of environmental contacts to successfully manipulate an object to a target location. Our method is based on deep reinforcement learning which, unlike state-of-the-art planning algorithms, does not require apriori knowledge of the physical parameters of the object or environment such as friction coefficients or centers of mass. The planning time is reduced to the simple feed-forward prediction time on a neural network. We propose a computational structure, action space design, and curriculum learning scheme that facilitates efficient exploration and sim-to-real transfer. In challenging real-world non-prehensile manipulation tasks, we show that our method can generalize over different objects, and succeed even for novel objects not seen during training. Project website: https://sites.google.com/view/nonprenehsile-decomposition