mobile manipulator
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
He, Chengyang, Sun, Ge, Bai, Yue, Lu, Junkai, Zhao, Jiadong, Sartoretti, Guillaume
F ALCON actively decouples locomotion and manipulation through two modular diffusion policies, coordinated by a vision-language foundation model. The VLM encodes global scene context, proprioceptive states, and goal instructions into a shared latent embedding that conditions both subsystems. Abstract--We present FoundAtion-model-guided decoupled LoCO-maNipulation visuomotor policies (F ALCON), a framework for loco-manipulation that combines modular diffusion policies with a vision-language foundation model as the coordinator . Our approach explicitly decouples locomotion and manipulation into two specialized visuomotor policies, allowing each subsystem to rely on its own observations. This mitigates the performance degradation that arise when a single policy is forced to fuse heterogeneous, potentially mismatched observations from locomotion and manipulation. Our key innovation lies in restoring coordination between these two independent policies through a vision-language foundation model, which encodes global observations and language instructions into a shared latent embedding conditioning both diffusion policies. On top of this backbone, we introduce a phase-progress head that uses textual descriptions of task stages to infer discrete phase and continuous progress estimates without manual phase labels. T o further structure the latent space, we incorporate a coordination-aware contrastive loss that explicitly encodes cross-subsystem compatibility between arm and base actions. Results show that it surpasses centralized and decentralized baselines while exhibiting improved robustness and generalization to out-of-distribution scenarios. ECENT progress in robot learning and foundation models has rekindled the longstanding vision of general-purpose robots that can move through unstructured environments and manipulate diverse objects with minimal task-specific engineering. Large Behavior Models (LBMs) extend the diffusion policy paradigm to multi-task dexterous manipulation [1], training a single policy across broad datasets of real and simulated trajectories. Robotics' Memo platform [8], demonstrate impressive whole-body behaviors that combine locomotion, manipulation, and language grounding in increasingly realistic environments. These developments suggest a future where robot generalist models consume raw sensor streams and language instructions and directly output actions to interact with the physical world. However, loco-manipulation, jointly controlling a mobile base and one or more arms, remains especially challenging on legged platforms [9]-[11], where the same body must simultaneously maintain stability and accomplish precise manipulation under different sensor streams and poses. In this work, we focus on a specific yet representative setting in which an arm-mounted quadruped robot performs long-horizon loco-manipulation tasks using only RGB observations, proprioceptive states, and sparse language instructions.
- Information Technology > Artificial Intelligence > Robots > Locomotion (0.66)
- Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)
TopAY: Efficient Trajectory Planning for Differential Drive Mobile Manipulators via Topological Paths Search and Arc Length-Yaw Parameterization
Xu, Long, Wong, Choilam, Zhang, Mengke, Lin, Junxiao, Hou, Jialiang, Gao, Fei
Abstract-- Differential drive mobile manipulators combine the mobility of wheeled bases with the manipulation capability of multi-joint arms, enabling versatile applications but posing considerable challenges for trajectory planning due to their high-dimensional state space and nonholonomic constraints. This paper introduces T opA Y, an optimization-based planning framework designed for efficient and safe trajectory generation for differential drive mobile manipulators. The framework employs a hierarchical initial value acquisition strategy, including topological paths search for the base and parallel sampling for the manipulator . A polynomial trajectory representation with arc length-yaw parameterization is also proposed to reduce optimization complexity while preserving dynamic feasibility. Extensive simulation and real-world experiments validate that T opA Y achieves higher planning efficiency and success rates than state-of-the-art method in dense and complex scenarios. The source code is released at https://github.com/T Differential drive mobile manipulator (DDMoMa), comprising multi-joint manipulator(s) mounted on a differential drive base (DDB), integrates rich manipulation ability of manipulators and mobility of wheeled robots.
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation
Zhao, Zhigen, Yu, Liuchuan, Jing, Ke, Yang, Ning
The rapid advancement of Vision-Language-Action models has created an urgent need for large-scale, high-quality robot demonstration datasets. Although teleoperation is the predominant method for data collection, current approaches suffer from limited scalability, complex setup procedures, and suboptimal data quality. This paper presents XRoboToolkit, a cross-platform framework for extended reality based robot teleoperation built on the OpenXR standard. The system features low-latency stereoscopic visual feedback, optimization-based inverse kinematics, and support for diverse tracking modalities including head, controller, hand, and auxiliary motion trackers. XRoboToolkit's modular architecture enables seamless integration across robotic platforms and simulation environments, spanning precision manipulators, mobile robots, and dexterous hands. We demonstrate the framework's effectiveness through precision manipulation tasks and validate data quality by training VLA models that exhibit robust autonomous performance.
- North America > United States > Virginia > Fairfax County > Fairfax (0.04)
- North America > United States > Kansas > Cowley County (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
Kinematic Analysis and Integration of Vision Algorithms for a Mobile Manipulator Employed Inside a Self-Driving Laboratory
Sulaiman, Shifa, Jensen, Tobias Busk, Bengtson, Stefan Hein, Bøgh, Simon
Recent advances in robotics and autonomous systems have broadened the use of robots in laboratory settings, including automated synthesis, scalable reaction workflows, and collaborative tasks in self-driving laboratories (SDLs). This paper presents a comprehensive development of a mobile manipulator designed to assist human operators in such autonomous lab environments. Kinematic modeling of the manipulator is carried out based on the Denavit Hartenberg (DH) convention and inverse kinematics solution is determined to enable precise and adaptive manipulation capabilities. A key focus of this research is enhancing the manipulator ability to reliably grasp textured objects as a critical component of autonomous handling tasks. Advanced vision-based algorithms are implemented to perform real-time object detection and pose estimation, guiding the manipulator in dynamic grasping and following tasks. In this work, we integrate a vision method that combines feature-based detection with homography-driven pose estimation, leveraging depth information to represent an object pose as a $2$D planar projection within $3$D space. This adaptive capability enables the system to accommodate variations in object orientation and supports robust autonomous manipulation across diverse environments. By enabling autonomous experimentation and human-robot collaboration, this work contributes to the scalability and reproducibility of next-generation chemical laboratories
- Europe > Denmark > North Jutland > Aalborg (0.05)
- North America (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Belmonte-Baeza, Alvaro, Cazorla, Miguel, García, Gabriel J., Pérez-Del-Pulgar, Carlos J., Pomares, Jorge
Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.
- North America > United States (0.28)
- Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation
Mei, Yilin, Qiu, Peng, Zhang, Wei, Zhang, WenChao, Song, Wenjie
Recent advances in robotics have been largely driven by imitation learning, which depends critically on large-scale, high-quality demonstration data. However, collecting such data remains a significant challenge-particularly for mobile manipulators, which must coordinate base locomotion and arm manipulation in high-dimensional, dynamic, and partially observable environments. Consequently, most existing research remains focused on simpler tabletop scenarios, leaving mobile manipulation relatively underexplored. To bridge this gap, we present \textit{MobRT}, a digital twin-based framework designed to simulate two primary categories of complex, whole-body tasks: interaction with articulated objects (e.g., opening doors and drawers) and mobile-base pick-and-place operations. \textit{MobRT} autonomously generates diverse and realistic demonstrations through the integration of virtual kinematic control and whole-body motion planning, enabling coherent and physically consistent execution. We evaluate the quality of \textit{MobRT}-generated data across multiple baseline algorithms, establishing a comprehensive benchmark and demonstrating a strong correlation between task success and the number of generated trajectories. Experiments integrating both simulated and real-world demonstrations confirm that our approach markedly improves policy generalization and performance, achieving robust results in both simulated and real-world environments.
AnywhereVLA: Language-Conditioned Exploration and Mobile Manipulation
Gubernatorov, Konstantin, Voronov, Artem, Voronov, Roman, Pasynkov, Sergei, Perminov, Stepan, Guo, Ziang, Tsetserukou, Dzmitry
We address natural language pick-and-place in unseen, unpredictable indoor environments with AnywhereVLA, a modular framework for mobile manipulation. A user text prompt serves as an entry point and is parsed into a structured task graph that conditions classical SLAM with LiDAR and cameras, metric semantic mapping, and a task-aware frontier exploration policy. An approach planner then selects visibility and reachability aware pre grasp base poses. For interaction, a compact SmolVLA manipulation head is fine tuned on platform pick and place trajectories for the SO-101 by TheRobotStudio, grounding local visual context and sub-goals into grasp and place proposals. The full system runs fully onboard on consumer-level hardware, with Jetson Orin NX for perception and VLA and an Intel NUC for SLAM, exploration, and control, sustaining real-time operation. We evaluated AnywhereVLA in a multi-room lab under static scenes and normal human motion. In this setting, the system achieves a $46\%$ overall task success rate while maintaining throughput on embedded compute. By combining a classical stack with a fine-tuned VLA manipulation, the system inherits the reliability of geometry-based navigation with the agility and task generalization of language-conditioned manipulation.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
A Bimanual Gesture Interface for ROS-Based Mobile Manipulators Using TinyML and Sensor Fusion
Bhuiyan, Najeeb Ahmed, Huq, M. Nasimul, Chowdhury, Sakib H., Mangharam, Rahul
Gesture-based control for mobile manipulators faces persistent challenges in reliability, efficiency, and intuitiveness. This paper presents a dual-hand gesture interface that integrates TinyML, spectral analysis, and sensor fusion within a ROS framework to address these limitations. The system uses left-hand tilt and finger flexion, captured using accelerometer and flex sensors, for mobile base navigation, while right-hand IMU signals are processed through spectral analysis and classified by a lightweight neural network. This pipeline enables TinyML-based gesture recognition to control a 7-DOF Kinova Gen3 manipulator. By supporting simultaneous navigation and manipulation, the framework improves efficiency and coordination compared to sequential methods. Key contributions include a bimanual control architecture, real-time low-power gesture recognition, robust multimodal sensor fusion, and a scalable ROS-based implementation. The proposed approach advances Human-Robot Interaction (HRI) for industrial automation, assistive robotics, and hazardous environments, offering a cost-effective, open-source solution with strong potential for real-world deployment and further optimization.
- Asia > Bangladesh (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Health & Medicine > Therapeutic Area (0.93)
- Government (0.68)
- Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.92)
ORB: Operating Room Bot, Automating Operating Room Logistics through Mobile Manipulation
Qiu, Jinkai, Kim, Yungjun, Sethia, Gaurav, Agarwal, Tanmay, Ghodasara, Siddharth, Erickson, Zackory, Ichnowski, Jeffrey
Abstract-- Efficiently delivering items to an ongoing surgery in a hospital operating room can be a matter of life or death. In modern hospital settings, delivery robots have successfully transported bulk items between rooms and floors. However, automating item-level operating room logistics presents unique challenges in perception, efficiency, and maintaining sterility. We propose the Operating Room Bot (ORB), a robot framework to automate logistics tasks in hospital operating rooms (OR). ORB leverages a robust, hierarchical behavior tree (BT) architecture to integrate diverse functionalities of object recognition, scene interpretation, and GPU-accelerated motion planning. The contributions of this paper include: (1) a modular software architecture facilitating robust mobile manipulation through behavior trees; (2) a novel real-time object recognition pipeline integrating YOLOv7, Segment Anything Model 2 (SAM2), and Grounded DINO; (3) the adaptation of the cuRobo parallelized trajectory optimization framework to real-time, collision-free mobile manipulation; and (4) empirical validation demonstrating an 80% success rate in OR supply retrieval and a 96% success rate in restocking operations. These contributions establish ORB as a reliable and adaptable system for autonomous OR logistics.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.70)
Leg-Arm Coordinated Operation for Curtain Wall Installation
Liu, Xiao, Wang, Weijun, Huang, Tianlun, Wang, Zhiyong, Feng, Wei
With the acceleration of urbanization, the number of high-rise buildings and large public facilities is increasing, making curtain walls an essential component of modern architecture with widespread applications. Traditional curtain wall installation methods face challenges such as variable on-site terrain, high labor intensity, low construction efficiency, and significant safety risks. Large panels often require multiple workers to complete installation. To address these issues, based on a hexapod curtain wall installation robot, we design a hierarchical optimization-based whole-body control framework for coordinated arm-leg planning tailored to three key tasks: wall installation, ceiling installation, and floor laying. This framework integrates the motion of the hexapod legs with the operation of the folding arm and the serial-parallel manipulator. We conduct experiments on the hexapod curtain wall installation robot to validate the proposed control method, demonstrating its capability in performing curtain wall installation tasks. Our results confirm the effectiveness of the hierarchical optimization-based arm-leg coordination framework for the hexapod robot, laying the foundation for its further application in complex construction site environments.
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Hubei Province (0.04)