robot arm
High-Performance Dual-Arm Task and Motion Planning for Tabletop Rearrangement
Zhang, Duo, Huang, Junshan, Yu, Jingjin
Abstract-- We propose Synchronous Dual-Arm Rearrangement Planner (SDAR), a task and motion planning (T AMP) framework for tabletop rearrangement, where two robot arms equipped with 2-finger grippers must work together in close proximity to rearrange objects whose start and goal configurations are strongly entangled. T o tackle such challenges, SDAR tightly knit together its dependency-driven task planner (SDAR-T) and synchronous dual-arm motion planner (SDAR-M), to intelligently sift through a large number of possible task and motion plans. Specifically, SDAR-T applies a simple yet effective strategy to decompose the global object dependency graph induced by the rearrangement task, to produce more optimal dual-arm task plans than solutions derived from optimal task plans for a single arm. Leveraging state-of-the-art GPU SIMD-based motion planning tools, SDAR-M employs a layered motion planning strategy to sift through many task plans for the best synchronous dual-arm motion plan while ensuring high levels of success rate. Comprehensive evaluation demonstrates that SDAR delivers a 100% success rate in solving complex, non-monotone, long-horizon tabletop rearrangement tasks with solution quality far exceeding the previous state-of-the-art. Experiments on two UR-5e arms further confirm SDAR directly and reliably transfers to robot hardware. Task and motion planning (T AMP) [1] represents a fundamental computation challenge in robotics, in which a robot system, e.g., one or more robot arms, must break down a given, potentially long-horizon task into suitable "bite-sized" sub-tasks that can be executed through short-horizon robot motions.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.14)
- Europe > Germany > Berlin (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Training-Free Robot Pose Estimation using Off-the-Shelf Foundational Models
Pose estimation of a robot arm from visual inputs is a challenging task. However, with the increasing adoption of robot arms for both industrial and residential use cases, reliable joint angle estimation can offer improved safety and performance guarantees, and also be used as a verifier to further train robot policies. This paper introduces using frontier vision-language models (VLMs) as an ``off-the-shelf" tool to estimate a robot arm's joint angles from a single target image. By evaluating frontier VLMs on both synthetic and real-world image-data pairs, this paper establishes a performance baseline attained by current FLMs. In addition, this paper presents empirical results suggesting that test time scaling or parameter scaling alone does not lead to improved joint angle predictions.
Think Fast: Real-Time Kinodynamic Belief-Space Planning for Projectile Interception
Olin, Gabriel, Chen, Lu, Gandotra, Nayesha, Likhachev, Maxim, Choset, Howie
Intercepting fast moving objects, by its very nature, is challenging because of its tight time constraints. This problem becomes further complicated in the presence of sensor noise because noisy sensors provide, at best, incomplete information, which results in a distribution over target states to be intercepted. Since time is of the essence, to hit the target, the planner must begin directing the interceptor, in this case a robot arm, while still receiving information. We introduce an tree-like structure, which is grown using kinodynamic motion primitives in state-time space. This tree-like structure encodes reachability to multiple goals from a single origin, while enabling real-time value updates as the target belief evolves and seamless transitions between goals. We evaluate our framework on an interception task on a 6 DOF industrial arm (ABB IRB-1600) with an onboard stereo camera (ZED 2i). A robust Innovation-based Adaptive Estimation Adaptive Kalman Filter (RIAE-AKF) is used to track the target and perform belief updates.
- North America > United States (0.04)
- Europe > Portugal > Madeira > Funchal (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > China > Hong Kong (0.04)
RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video
Mei, Haiyang, Huang, Qiming, Ci, Hai, Shou, Mike Zheng
Accurate robot segmentation is a fundamental capability for robotic perception. It enables precise visual servoing for VLA systems, scalable robot-centric data augmentation, accurate real-to-sim transfer, and reliable safety monitoring in dynamic human-robot environments. Despite the strong capabilities of modern segmentation models, surprisingly it remains challenging to segment robots. This is due to robot embodiment diversity, appearance ambiguity, structural complexity, and rapid shape changes. Embracing these challenges, we introduce RobotSeg, a foundation model for robot segmentation in image and video. RobotSeg is built upon the versatile SAM 2 foundation model but addresses its three limitations for robot segmentation, namely the lack of adaptation to articulated robots, reliance on manual prompts, and the need for per-frame training mask annotations, by introducing a structure-enhanced memory associator, a robot prompt generator, and a label-efficient training strategy. These innovations collectively enable a structure-aware, automatic, and label-efficient solution. We further construct the video robot segmentation (VRS) dataset comprising over 2.8k videos (138k frames) with diverse robot embodiments and environments. Extensive experiments demonstrate that RobotSeg achieves state-of-the-art performance on both images and videos, establishing a strong foundation for future advances in robot perception.
- Asia > Singapore (0.40)
- Asia > Middle East > Jordan (0.04)
How Robot Kinematics Influence Human Performance in Virtual Robot-to-Human Handover Tasks
Keenan, Róisín, Dessing, Joost C.
Recent advancements in robotics have increased the possibilities for integrating robotic systems into human-involved workplaces, highlighting the need to examine and optimize human-robot coordination in collaborative settings. This study explores human-robot interactions during handover tasks using Virtual Reality (VR) to investigate differences in human motor performance across various task dynamics and robot kinematics. A VR-based robot handover simulation afforded safe and controlled assessments of human-robot interactions. In separate experiments, four potential influences on human performance were examined (1) control over task initiation and robot movement synchrony (temporal and spatiotemporal); (2) partner appearance (human versus robotic); (3) robot velocity profiles (minimum jerk, constant velocity, constant acceleration, and biphasic); and (4) the timing of rotational object motion. Findings across experiments emphasize humans benefit from robots providing early and salient visual information about task-relevant object motion, and advantages of human-like smooth robot trajectories. To varying degrees, these manipulations improved predictive accuracy and synchronization during interaction. This suggests that human-robot interactions should be designed to allow humans to leverage their natural capabilities for detecting biological motion, which conversely may reduce the need for costly robotic computations or added cognitive adaptation on the human side.
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > Northern Ireland (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment (0.93)
- Health & Medicine > Therapeutic Area > Neurology (0.93)
Stable Offline Hand-Eye Calibration for any Robot with Just One Mark
Xie, Sicheng, Meng, Lingchen, Du, Zhiying, Tu, Shuyuan, Cao, Haidong, Leng, Jiaqi, Wu, Zuxuan, Jiang, Yu-Gang
Imitation learning has achieved remarkable success in a variety of robotic tasks by learning a mapping function from camera-space observations to robot-space actions. Recent work indicates that the use of robot-to-camera transformation information ({\ie}, camera extrinsics) benefits the learning process and produces better results. However, camera extrinsics are oftentimes unavailable and estimation methods usually suffer from local minima and poor generalizations. In this paper, we present CalibAll, a simple yet effective method that \textbf{requires only a single mark} and performs training-free, stable, and accurate camera extrinsic estimation across diverse robots and datasets through a coarse-to-fine calibration pipeline. In particular, we annotate a single mark on an end-effector (EEF), and leverage the correspondence ability emerged from vision foundation models (VFM) to automatically localize the corresponding mark across robots in diverse datasets. Using this mark, together with point tracking and the 3D EEF trajectory, we obtain a coarse camera extrinsic via temporal Perspective-n-Point (PnP). This estimate is further refined through a rendering-based optimization that aligns rendered and ground-true masks, yielding accurate and stable camera extrinsic. Experimental results demonstrate that our method outperforms state-of-the-art approaches, showing strong robustness and general effectiveness across three robot platforms. It also produces useful auxiliary annotations such as depth maps, link-wise masks, and end-effector 2D trajectories, which can further support downstream tasks.
Painted Heart Beats
Adhya, Angshu, Yang, Cindy, Wu, Emily, Hasan, Rishad, Narula, Abhishek, Alves-Oliveira, Patrícia
We developed a robot arm that collaboratively paints with a human artist. The robot has an awareness of the artist's heartbeat through the EmotiBit sensor, which provides the arousal levels of the painter . Given the heartbeat detected, the robot decides to increase proximity to the artist's workspace or retract. If a higher heartbeat is detected, which is associated with increased arousal in human artists, the robot will move away from that area of the canvas. If the artist's heart rate is detected as neutral, indicating the human artist's baseline state, the robot will continue its painting actions across the entire canvas. We also demonstrate and propose alternative robot-artist interactions using natural language and physical touch. This work combines the biometrics of a human artist to inform fluent artistic interactions.
MiniBEE: A New Form Factor for Compact Bimanual Dexterity
Islam, Sharfin, Chen, Zewen, He, Zhanpeng, Bhatt, Swapneel, Permuy, Andres, Taylor, Brock, Vickery, James, Lu, Zhengbin, Zhang, Cheng, Piacenza, Pedro, Ciocarlie, Matei
Abstract-- Bimanual robot manipulators can achieve impressive dexterity, but typically rely on two full six-or seven-degree-of-freedom arms so that paired grippers can coordinate effectively. We introduce the MiniBEE (Miniature Bimanual End-effector), a compact system in which two reduced-mobility arms (3+ DOF each) are coupled into a kinematic chain that preserves full relative positioning between grippers. T o guide our design, we formulate a kinematic dexterity metric that enlarges the dexterous workspace while keeping the mechanism lightweight and wearable. The resulting system supports two complementary modes: (i) wearable kinesthetic data collection with self-tracked gripper poses, and (ii) deployment on a standard robot arm, extending dexterity across its entire workspace. We present kinematic analysis and design optimization methods for maximizing dexterous range, and demonstrate an end-to-end pipeline in which wearable demonstrations train imitation learning policies that perform robust, real-world bimanual manipulation. In recent years, bimanual robotic manipulators have shown remarkable dexterity. The combination of imitation learning from human demonstrations and two well-articulated kinematic chains has enabled such systems to use simple parallel grippers to autonomously perform highly dexterous tasks [1]-[7], with robustness to initial conditions or perturbations encountered during execution [8]-[10]. To achieve these results, current systems typically rely on the combination of two 6-or 7-degree-of-freedom (DOF) robotic arms.