contact state
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
Detecting Hands and Recognizing Physical Contact in the Wild
We investigate a new problem of detecting hands and recognizing their physical contact state in unconstrained conditions. This is a challenging inference task given the need to reason beyond the local appearance of hands. The lack of training annotations indicating which object or parts of an object the hand is in contact with further complicates the task.
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > Canada (0.04)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
MARG: MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping
Dong, Yinzhao, Ma, Ji, Zhao, Liu, Li, Wanyue, Lu, Peng
Deep Reinforcement Learning (DRL) controllers for quadrupedal locomotion have demonstrated impressive performance on challenging terrains, allowing robots to execute complex skills such as climbing, running, and jumping. However, existing blind locomotion controllers often struggle to ensure safety and efficient traversal through risky gap terrains, which are typically highly complex, requiring robots to perceive terrain information and select appropriate footholds during locomotion accurately. Meanwhile, existing perception-based controllers still present several practical limitations, including a complex multi-sensor deployment system and expensive computing resource requirements. This paper proposes a DRL controller named MAstering Risky Gap Terrains (MARG), which integrates terrain maps and proprioception to dynamically adjust the action and enhance the robot's stability in these tasks. During the training phase, our controller accelerates policy optimization by selectively incorporating privileged information (e.g., center of mass, friction coefficients) that are available in simulation but unmeasurable directly in real-world deployments due to sensor limitations. We also designed three foot-related rewards to encourage the robot to explore safe footholds. More importantly, a terrain map generation (TMG) model is proposed to reduce the drift existing in mapping and provide accurate terrain maps using only one LiDAR, providing a foundation for zero-shot transfer of the learned policy. The experimental results indicate that MARG maintains stability in various risky terrain tasks.
- Asia > China > Hong Kong (0.05)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- (4 more...)
- Research Report (0.64)
- Workflow (0.46)
- Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
FBI: Learning Dexterous In-hand Manipulation with Dynamic Visuotactile Shortcut Policy
Chen, Yijin, Xu, Wenqiang, Yu, Zhenjun, Tang, Tutian, Li, Yutong, Yao, Siqiong, Lu, Cewu
Figure 1: We propose Flow Before Imitation (FBI), a novel dynamic visuotactile imitation learning algorithm for dexterous in-hand manipulation. FBI's design enables two operational modes: with or without physical tactile sensors in the real world, largely extending the application scenarios. Abstract -- Dexterous in-hand manipulation is a long-standing challenge in robotics due to complex contact dynamics and partial observability. This paper introduces Flow Before Imitation (FBI), a visuotactile imitation learning framework that dynamically fuses tactile interactions with visual observations through motion dynamics. Unlike prior static fusion methods, FBI establishes a causal link between tactile signals and object motion via a dynamics-aware latent model. FBI employs a transformer-based interaction module to fuse flow-derived tactile features with visual inputs, training a one-step diffusion policy for real-time execution. Extensive experiments demonstrate that the proposed method outperforms the baseline methods in both simulation and the real world on two customized in-hand manipulation tasks and three standard dexterous manipulation tasks.
Few-shot transfer of tool-use skills using human demonstrations with proximity and tactile sensing
Aoyama, Marina Y., Vijayakumar, Sethu, Narita, Tetsuya
Tools extend the manipulation abilities of robots, much like they do for humans. Despite human expertise in tool manipulation, teaching robots these skills faces challenges. The complexity arises from the interplay of two simultaneous points of contact: one between the robot and the tool, and another between the tool and the environment. Tactile and proximity sensors play a crucial role in identifying these complex contacts. However, learning tool manipulation using these sensors remains challenging due to limited real-world data and the large sim-to-real gap. To address this, we propose a few-shot tool-use skill transfer framework using multimodal sensing. The framework involves pre-training the base policy to capture contact states common in tool-use skills in simulation and fine-tuning it with human demonstrations collected in the real-world target domain to bridge the domain gap. We validate that this framework enables teaching surface-following tasks using tools with diverse physical and geometric properties with a small number of demonstrations on the Franka Emika robot arm. Our analysis suggests that the robot acquires new tool-use skills by transferring the ability to recognise tool-environment contact relationships from pre-trained to fine-tuned policies. Additionally, combining proximity and tactile sensors enhances the identification of contact states and environmental geometry.
Accurate Pose Estimation Using Contact Manifold Sampling for Safe Peg-in-Hole Insertion of Complex Geometries
Negi, Abhay, Manyar, Omey M., Penmetsa, Dhanush K., Gupta, Satyandra K.
-- Robotic assembly of complex, non-convex geometries with tight clearances remains a challenging problem, demanding precise state estimation for successful insertion. In this work, we propose a novel framework that relies solely on contact states to estimate the full SE (3) pose of a peg relative to a hole. Our method constructs an online submanifold of contact states through primitive motions with just 6 seconds of online execution, subsequently mapping it to an offline contact manifold for precise pose estimation. We demonstrate that without such state estimation, robots risk jamming and excessive force application, potentially causing damage. We evaluate our approach on five industrially relevant, complex geometries with 0.1 to 1.0 mm clearances, achieving a 96.7% success rate-a 6 improvement over primitive-based insertion without state estimation. Additionally, we analyze insertion forces, and overall insertion times, showing our method significantly reduces the average wrench, enabling safer and more efficient assembly. I. INTRODUCTION The peg-in-hole insertion task is one of the most fundamental problems in robotics. In the realm of contact-rich manipulation and assembly, insertion-based tasks are often framed as peg-in-hole problems.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision > Video Understanding (0.62)
SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models
Li, Meng, Zhao, Zhen, Che, Zhengping, Liao, Fei, Wu, Kun, Xu, Zhiyuan, Ren, Pei, Jin, Zhao, Liu, Ning, Tang, Jian
Robots deployed in dynamic environments must be able to not only follow diverse language instructions but flexibly adapt when user intent changes mid-execution. While recent Vision-Language-Action (VLA) models have advanced multi-task learning and instruction following, they typically assume static task intent, failing to respond when new instructions arrive during ongoing execution. This limitation hinders natural and robust interaction in dynamic settings, such as retail or household environments, where real-time intent changes are common. We propose SwitchVLA, a unified, execution-aware framework that enables smooth and reactive task switching without external planners or additional switch-specific data. We model task switching as a behavior modulation problem conditioned on execution state and instruction context. Expert demonstrations are segmented into temporally grounded contact phases, allowing the policy to infer task progress and adjust its behavior accordingly. A multi-behavior conditional policy is then trained to generate flexible action chunks under varying behavior modes through conditioned trajectory modeling. Experiments in both simulation and real-world robotic manipulation demonstrate that SwitchVLA enables robust instruction adherence, fluid task switching, and strong generalization-outperforming prior VLA baselines in both task success rate and interaction naturalness.
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > China > Beijing > Beijing (0.04)
Multi-step manipulation task and motion planning guided by video demonstration
Zorina, Kateryna, Kovar, David, Fourmy, Mederic, Lamiraux, Florent, Mansard, Nicolas, Carpentier, Justin, Sivic, Josef, Petrik, Vladimir
--This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. T owards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later . We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. T o demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (i) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa . For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP). Traditional robot motion planning algorithms seek a collision-free path from a given starting robot configuration to a given goal robot configuration [1]. Despite the large dimensionality of the configuration space, sampling-based motion planning algorithms [2], [3] have shown to be highly effective for solving complex motion planning problems for robots, ranging from six degrees of freedom (DoFs) for industrial manipulators to tens of DoFs for humanoids [4]. Manipulation task-and-motion planning (T AMP) [5] adds an additional complexity to the problem by including movable objects in the state space. This requires the planner to discover the pick-and-place actions that connect the given start and goal robot configurations, bringing the manipulated objects from their start poses to their goal poses. INRIA, Paris This work is part of the AGIMUS project, funded by the European Union under GA no.101070165. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission.
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > Czechia > Prague (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.45)
- Education > Educational Technology (0.95)
- Government > Regional Government > Europe Government (0.74)
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Ma, Wenxuan, Cao, Xiaoge, Zhang, Yixiang, Zhang, Chaofan, Yang, Shaobo, Hao, Peng, Fang, Bin, Cai, Yinghao, Cui, Shaowei, Wang, Shuo
Recent advancements in integrating tactile sensing with vision-language models (VLMs) have demonstrated remarkable potential for robotic multimodal perception. However, existing tactile descriptions remain limited to superficial attributes like texture, neglecting critical contact states essential for robotic manipulation. To bridge this gap, we propose CLTP, an intuitive and effective language tactile pretraining framework that aligns tactile 3D point clouds with natural language in various contact scenarios, thus enabling contact-state-aware tactile language understanding for contact-rich manipulation tasks. We first collect a novel dataset of 50k+ tactile 3D point cloud-language pairs, where descriptions explicitly capture multidimensional contact states (e.g., contact location, shape, and force) from the tactile sensor's perspective. CLTP leverages a pre-aligned and frozen vision-language feature space to bridge holistic textual and tactile modalities. Experiments validate its superiority in three downstream tasks: zero-shot 3D classification, contact state classification, and tactile 3D large language model (LLM) interaction. To the best of our knowledge, this is the first study to align tactile and language representations from the contact state perspective for manipulation tasks, providing great potential for tactile-language-action model learning. Code and datasets are open-sourced at https://sites.google.com/view/cltp/.
DexFlow: A Unified Approach for Dexterous Hand Pose Retargeting and Interaction
Lin, Xiaoyi, Yao, Kunpeng, Xu, Lixin, Wang, Xueqiang, Li, Xuetao, Wang, Yuchen, Li, Miao
DexFlow: A Unified Approach for Dexterous Hand Pose Retargeting and Interaction Xiaoyi Lin 1, Kunpeng Y ao 2, Lixin Xu 3,Xueqiang Wang 4,Xuetao Li 1,Y uchen Wang 1,Miao Li 4, Abstract -- Despite advances in hand-object interaction modeling, generating realistic dexterous manipulation data for robotic hands remains a challenge. Retargeting methods often suffer from low accuracy and fail to account for hand-object interactions, leading to artifacts like interpenetration. Generative methods, lacking human hand priors, produce limited and unnatural poses. We propose a data transformation pipeline that combines human hand and object data from multiple sources for high-precision retargeting. Our approach uses a differential loss constraint to ensure temporal consistency and generates contact maps to refine hand-object interactions.
- Asia > China > Hubei Province > Wuhan (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)