Darvish, Kourosh
AnyPlace: Learning Generalized Object Placement for Robot Manipulation
Zhao, Yuchi, Bogdanovic, Miroslav, Luo, Chengyuan, Tohme, Steven, Darvish, Kourosh, Aspuru-Guzik, Alán, Shkurti, Florian, Garg, Animesh
Object placement in robotic tasks is inherently challenging due to the diversity of object geometries and placement configurations. To address this, we propose AnyPlace, a two-stage method trained entirely on synthetic data, capable of predicting a wide range of feasible placement poses for real-world tasks. Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently. For training, we generate a fully synthetic dataset of randomly generated objects in different placement configurations (insertion, stacking, hanging) and train local placement-prediction models. We conduct extensive evaluations in simulation, demonstrating that our method outperforms baselines in terms of success rate, coverage of possible placement modes, and precision. In real-world experiments, we show how our approach directly transfers models trained purely on synthetic data to the real world, where it successfully performs placements in scenarios where other models struggle -- such as with varying object geometries, diverse placement modes, and achieving high precision for fine placement. More at: https://any-place.github.io.
Accelerating Discovery in Natural Science Laboratories with AI and Robotics: Perspectives and Challenges from the 2024 IEEE ICRA Workshop, Yokohama, Japan
Cooper, Andrew I., Courtney, Patrick, Darvish, Kourosh, Eckhoff, Moritz, Fakhruldeen, Hatem, Gabrielli, Andrea, Garg, Animesh, Haddadin, Sami, Harada, Kanako, Hein, Jason, Hübner, Maria, Knobbe, Dennis, Pizzuto, Gabriella, Shkurti, Florian, Shrestha, Ruja, Thurow, Kerstin, Vescovi, Rafael, Vogel-Heuser, Birgit, Wolf, Ádám, Yoshikawa, Naruki, Zeng, Yan, Zhou, Zhengxue, Zwirnmann, Henning
Fundamental breakthroughs across many scientific disciplines are becoming increasingly rare (1). At the same time, challenges related to the reproducibility and scalability of experiments, especially in the natural sciences (2,3), remain significant obstacles. For years, automating scientific experiments has been viewed as the key to solving this problem. However, existing solutions are often rigid and complex, designed to address specific experimental tasks with little adaptability to protocol changes. With advancements in robotics and artificial intelligence, new possibilities are emerging to tackle this challenge in a more flexible and human-centric manner.
iCub3 Avatar System: Enabling Remote Fully-Immersive Embodiment of Humanoid Robots
Dafarra, Stefano, Pattacini, Ugo, Romualdi, Giulio, Rapetti, Lorenzo, Grieco, Riccardo, Darvish, Kourosh, Milani, Gianluca, Valli, Enrico, Sorrentino, Ines, Viceconte, Paolo Maria, Scalzo, Alessandro, Traversaro, Silvio, Sartore, Carlotta, Elobaid, Mohamed, Guedelha, Nuno, Herron, Connor, Leonessa, Alexander, Draicchio, Francesco, Metta, Giorgio, Maggiali, Marco, Pucci, Daniele
We present an avatar system designed to facilitate the embodiment of humanoid robots by human operators, validated through iCub3, a humanoid developed at the Istituto Italiano di Tecnologia (IIT). More precisely, the contribution of the paper is twofold: first, we present the humanoid iCub3 as a robotic avatar which integrates the latest significant improvements after about fifteen years of development of the iCub series; second, we present a versatile avatar system enabling humans to embody humanoid robots encompassing aspects such as locomotion, manipulation, voice, and face expressions with comprehensive sensory feedback including visual, auditory, haptic, weight, and touch modalities. We validate the system by implementing several avatar architecture instances, each tailored to specific requirements. First, we evaluated the optimized architecture for verbal, non-verbal, and physical interactions with a remote recipient. This testing involved the operator in Genoa and the avatar in the Biennale di Venezia, Venice - about 290 Km away - thus allowing the operator to visit remotely the Italian art exhibition. Second, we evaluated the optimised architecture for recipient physical collaboration and public engagement on-stage, live, at the We Make Future show, a prominent world digital innovation festival. In this instance, the operator was situated in Genoa while the avatar operates in Rimini - about 300 Km away - interacting with a recipient who entrusted the avatar a payload to carry on stage before an audience of approximately 2000 spectators. Third, we present the architecture implemented by the iCub Team for the ANA Avatar XPrize competition.
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization
Darvish, Kourosh, Skreta, Marta, Zhao, Yuchi, Yoshikawa, Naruki, Som, Sagnik, Bogdanovic, Miroslav, Cao, Yang, Hao, Han, Xu, Haoping, Aspuru-Guzik, Alán, Garg, Animesh, Shkurti, Florian
Chemistry experimentation is often resource- and labor-intensive. Despite the many benefits incurred by the integration of advanced and special-purpose lab equipment, many aspects of experimentation are still manually conducted by chemists, for example, polishing an electrode in electrochemistry experiments. Traditional lab automation infrastructure faces challenges when it comes to flexibly adapting to new chemistry experiments. To address this issue, we propose a human-friendly and flexible robotic system, ORGANA, that automates a diverse set of chemistry experiments. It is capable of interacting with chemists in the lab through natural language, using Large Language Models (LLMs). ORGANA keeps scientists informed by providing timely reports that incorporate statistical analyses. Additionally, it actively engages with users when necessary for disambiguation or troubleshooting. ORGANA can reason over user input to derive experiment goals, and plan long sequences of both high-level tasks and low-level robot actions while using feedback from the visual perception of the environment. It also supports scheduling and parallel execution for experiments that require resource allocation and coordination between multiple robots and experiment stations. We show that ORGANA successfully conducts a diverse set of chemistry experiments, including solubility assessment, pH measurement, recrystallization, and electrochemistry experiments. For the latter, we show that ORGANA robustly executes a long-horizon plan, comprising 19 steps executed in parallel, to characterize the electrochemical properties of quinone derivatives, a class of molecules used in rechargeable flow batteries. Our user study indicates that ORGANA significantly improves many aspects of user experience while reducing their physical workload. More details about ORGANA can be found at https://ac-rad.github.io/organa/.
RePLan: Robotic Replanning with Perception and Language Models
Skreta, Marta, Zhou, Zihan, Yuan, Jia Lin, Darvish, Kourosh, Aspuru-Guzik, Alán, Garg, Animesh
Advancements in large language models (LLMs) have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level robot actions, effectively bridging the interface between high-level planning and low-level robot control. However, the challenge remains that even with syntactically correct plans, robots can still fail to achieve their intended goals. This failure can be attributed to imperfect plans proposed by LLMs or to unforeseeable environmental circumstances that hinder the execution of planned subtasks due to erroneous assumptions about the state of objects. One way to prevent these challenges is to rely on human-provided step-by-step instructions, limiting the autonomy of robotic systems. Vision Language Models (VLMs) have shown remarkable success in tasks such as visual question answering and image captioning. Leveraging the capabilities of VLMs, we present a novel framework called Robotic Replanning with Perception and Language Models (RePLan) that enables real-time replanning capabilities for long-horizon tasks. This framework utilizes the physical grounding provided by a VLM's understanding of the world's state to adapt robot actions when the initial plan fails to achieve the desired goal. We test our approach within four environments containing seven long-horizion tasks. We find that RePLan enables a robot to successfully adapt to unforeseen obstacles while accomplishing open-ended, long-horizon goals, where baseline models cannot. Find more information at https://replan-lm.github.io/replan.github.io/
Online Action Recognition for Human Risk Prediction with Anticipated Haptic Alert via Wearables
Guo, Cheng, Rapetti, Lorenzo, Darvish, Kourosh, Grieco, Riccardo, Draicchio, Francesco, Pucci, Daniele
This paper proposes a framework that combines online human state estimation, action recognition and motion prediction to enable early assessment and prevention of worker biomechanical risk during lifting tasks. The framework leverages the NIOSH index to perform online risk assessment, thus fitting real-time applications. In particular, the human state is retrieved via inverse kinematics/dynamics algorithms from wearable sensor data. Human action recognition and motion prediction are achieved by implementing an LSTM-based Guided Mixture of Experts architecture, which is trained offline and inferred online. With the recognized actions, a single lifting activity is divided into a series of continuous movements and the Revised NIOSH Lifting Equation can be applied for risk assessment. Moreover, the predicted motions enable anticipation of future risks. A haptic actuator, embedded in the wearable system, can alert the subject of potential risk, acting as an active prevention device. The performance of the proposed framework is validated by executing real lifting tasks, while the subject is equipped with the iFeel wearable system.
Chemistry Lab Automation via Constrained Task and Motion Planning
Yoshikawa, Naruki, Li, Andrew Zou, Darvish, Kourosh, Zhao, Yuchi, Xu, Haoping, Kuramshin, Artur, Aspuru-Guzik, Alán, Garg, Animesh, Shkurti, Florian
Chemists need to perform many laborious and time-consuming experiments in the lab to discover and understand the properties of new materials. To support and accelerate this process, we propose a robot framework for manipulation that autonomously performs chemistry experiments. Our framework receives high-level abstract descriptions of chemistry experiments, perceives the lab workspace, and autonomously plans multi-step actions and motions. The robot interacts with a wide range of lab equipment and executes the generated plans. A key component of our method is constrained task and motion planning using PDDLStream solvers. Preventing collisions and spillage is done by introducing a constrained motion planner. Our planning framework can conduct different experiments employing implemented actions and lab tools. We demonstrate the utility of our framework on pouring skills for various materials and two fundamental chemical experiments for materials synthesis: solubility and recrystallization.
Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting
Skreta, Marta, Yoshikawa, Naruki, Arellano-Rubach, Sebastian, Ji, Zhi, Kristensen, Lasse Bjørn, Darvish, Kourosh, Aspuru-Guzik, Alán, Shkurti, Florian, Garg, Animesh
Generating low-level robot task plans from high-level natural language instructions remains a challenging problem. Although large language models have shown promising results in generating plans, the accuracy of the output remains unverified. Furthermore, the lack of domain-specific language data poses a limitation on the applicability of these models. In this paper, we propose CLAIRIFY, a novel approach that combines automatic iterative prompting with program verification to ensure programs written in data-scarce domain-specific language are syntactically valid and incorporate environment constraints. Our approach provides effective guidance to the language model on generating structured-like task plans by incorporating any errors as feedback, while the verifier ensures the syntactic accuracy of the generated plans. We demonstrate the effectiveness of CLAIRIFY in planning chemistry experiments by achieving state-of-the-art results. We also show that the generated plans can be executed on a real robot by integrating them with a task and motion planner.
Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics Prediction from Wearable Sensors
Darvish, Kourosh, Ivaldi, Serena, Pucci, Daniele
This paper presents a novel approach to solve simultaneously the problems of human activity recognition and whole-body motion and dynamics prediction for real-time applications. Starting from the dynamics of human motion and motor system theory, the notion of mixture of experts from deep learning has been extended to address this problem. In the proposed approach, experts are modelled as a sequence-to-sequence recurrent neural networks (RNN) architecture. Experiments show the results of 66-DoF real-world human motion prediction and action recognition during different tasks like walking and rotating. The code associated with this paper is available at: \url{github.com/ami-iit/paper_darvish_2022_humanoids_action-kindyn-predicition}
Teleoperation of Humanoid Robots: A Survey
Darvish, Kourosh, Penco, Luigi, Ramos, Joao, Cisneros, Rafael, Pratt, Jerry, Yoshida, Eiichi, Ivaldi, Serena, Pucci, Daniele
Teleoperation of humanoid robots enables the integration of the cognitive skills and domain expertise of humans with the physical capabilities of humanoid robots. The operational versatility of humanoid robots makes them the ideal platform for a wide range of applications when teleoperating in a remote environment. However, the complexity of humanoid robots imposes challenges for teleoperation, particularly in unstructured dynamic environments with limited communication. Many advancements have been achieved in the last decades in this area, but a comprehensive overview is still missing. This survey paper gives an extensive overview of humanoid robot teleoperation, presenting the general architecture of a teleoperation system and analyzing the different components. We also discuss different aspects of the topic, including technological and methodological advances, as well as potential applications. A web-based version of the paper can be found at https://humanoid-teleoperation.github.io/.