Bekris, Kostas
Impact-resistant, autonomous robots inspired by tensegrity architecture
Johnson, William R. III, Huang, Xiaonan, Lu, Shiyang, Wang, Kun, Booth, Joran W., Bekris, Kostas, Kramer-Bottiglio, Rebecca
Future robots will navigate perilous, remote environments with resilience and autonomy. Researchers have proposed building robots with compliant bodies to enhance robustness, but this approach often sacrifices the autonomous capabilities expected of rigid robots. Inspired by tensegrity architecture, we introduce a tensegrity robot -- a hybrid robot made from rigid struts and elastic tendons -- that demonstrates the advantages of compliance and the autonomy necessary for task performance. This robot boasts impact resistance and autonomy in a field environment and additional advances in the state of the art, including surviving harsh impacts from drops (at least 5.7 m), accurately reconstructing its shape and orientation using on-board sensors, achieving high locomotion speeds (18 bar lengths per minute), and climbing the steepest incline of any tensegrity robot (28 degrees). We characterize the robot's locomotion on unstructured terrain, showcase its autonomous capabilities in navigation tasks, and demonstrate its robustness by rolling it off a cliff.
Learning Differentiable Tensegrity Dynamics using Graph Neural Networks
Chen, Nelson, Wang, Kun, Johnson, William R. III, Kramer-Bottiglio, Rebecca, Bekris, Kostas, Aanjaneya, Mridul
Tensegrity robots are composed of rigid struts and flexible cables. They constitute an emerging class of hybrid rigid-soft robotic systems and are promising systems for a wide array of applications, ranging from locomotion to assembly. They are difficult to control and model accurately, however, due to their compliance and high number of degrees of freedom. To address this issue, prior work has introduced a differentiable physics engine designed for tensegrity robots based on first principles. In contrast, this work proposes the use of graph neural networks to model contact dynamics over a graph representation of tensegrity robots, which leverages their natural graph-like cable connectivity between end caps of rigid rods. This learned simulator can accurately model 3-bar and 6-bar tensegrity robot dynamics in simulation-to-simulation experiments where MuJoCo is used as the ground truth. It can also achieve higher accuracy than the previous differentiable engine for a real 3-bar tensegrity robot, for which the robot state is only partially observable. When compared against direct applications of recent mesh-based graph neural network simulators, the proposed approach is computationally more efficient, both for training and inference, while achieving higher accuracy. Code and data are available at https://github.com/nchen9191/tensegrity_gnn_simulator_public
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Lu, Shiyang, Chang, Haonan, Jing, Eric Pu, Boularias, Abdeslam, Bekris, Kostas
This work presents OVIR-3D, a straightforward yet effective method for open-vocabulary 3D object instance retrieval without using any 3D data for training. Given a language query, the proposed method is able to return a ranked set of 3D object instance segments based on the feature similarity of the instance and the text query. This is achieved by a multi-view fusion of text-aligned 2D region proposals into 3D space, where the 2D region proposal network could leverage 2D datasets, which are more accessible and typically larger than 3D datasets. The proposed fusion process is efficient as it can be performed in real-time for most indoor 3D scenes and does not require additional training in 3D space. Experiments on public datasets and a real robot show the effectiveness of the method and its potential for applications in robot navigation and manipulation.
Socially Cognizant Robotics for a Technology Enhanced Society
Dana, Kristin J., Andrews, Clinton, Bekris, Kostas, Feldman, Jacob, Stone, Matthew, Hemmer, Pernille, Mazzeo, Aaron, Salzman, Hal, Yi, Jingang
Applications of robotics (such as telepresence, transportation, elder-care, remote health care, cleaning, warehouse logistics, and delivery) are bringing significant changes in individuals' lives and are having profound social impact. Despite the envisioned potential of robotics, the goal of ubiquitous robot assistants augmenting quality of life (and quality of work life) has not yet been realized. Key challenges lie in the complexities of four overarching human-centric objectives that such systems must aim for: 1) improving quality of life of people, especially marginalized communities; 2) anticipating and mitigating unintended negative consequences of technological development; 3) enabling robots to adapt to the desires and needs of human counterparts; 4) respecting the need for human autonomy and agency. Pursuing these objectives requires an integrated cohort of technologists, behavioral scientists and social scientists with a shared vision to pursue a deep, multidisciplinary understanding of how robots interact with individuals and society. We introduce a new term, socially cognizant robotics, to describe this multi-faceted interdisciplinary branch of technology. The emerging practitioner, the socially cognizant roboticist, represents the convergence of socially aware technologists, who can develop intelligent devices that adapt to human and social behavior; and technology-aware social scientists and policymakers, who can translate studies of robotics' social effects into actionable and technically-viable principles and policies. A primary element of socially cognizant robotics is a deliberate "invitation to the table" for social scientists, who bring analytical perspectives and methods that are not typically present in robotics. These perspectives cover two levels of human-technology interaction that we view as essential: the human-robot dyad (Section 2) and the robot-society dyad (Section 3). Figure 1 illustrates how these levels might operate in the context of the workplace and everyday life.
Pick Planning Strategies for Large-Scale Package Manipulation
Li, Shuai, Keipour, Azarakhsh, Jamieson, Kevin, Hudson, Nicolas, Zhao, Sicong, Swan, Charles, Bekris, Kostas
Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to market fluctuations. This extended abstract showcases a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which is used for picking and singulating up to 6 million packages per day and so far has manipulated over 2 billion packages. It describes the various heuristic methods developed over time and their successor, which utilizes a pick success predictor trained on real production data. To the best of the authors' knowledge, this work is the first large-scale deployment of learned pick quality estimation methods in a real production system.
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
Chang, Haonan, Boyalakuntla, Kowndinya, Lu, Shiyang, Cai, Siwei, Jing, Eric, Keskar, Shreesh, Geng, Shijie, Abbas, Adeeb, Zhou, Lifeng, Bekris, Kostas, Boularias, Abdeslam
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.
Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine
Wang, Kun, Johnson, William R. III, Lu, Shiyang, Huang, Xiaonan, Booth, Joran, Kramer-Bottiglio, Rebecca, Aanjaneya, Mridul, Bekris, Kostas
Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for developing locomotion policies that can be transferred to real robots. Nevertheless, modeling tensegrity robots is a complex task due to a substantial sim2real gap. To address this issue, this paper describes a Real2Sim2Real (R2S2R) strategy for tensegrity robots. This strategy is based on a differentiable physics engine that can be trained given limited data from a real robot. These data include offline measurements of physical properties, such as mass and geometry for various robot components, and the observation of a trajectory using a random control policy. With the data from the real robot, the engine can be iteratively refined and used to discover locomotion policies that are directly transferable to the real robot. Beyond the R2S2R pipeline, key contributions of this work include computing non-zero gradients at contact points, a loss function for matching tensegrity locomotion gaits, and a trajectory segmentation technique that avoids conflicts in gradient evaluation during training. Multiple iterations of the R2S2R process are demonstrated and evaluated on a real 3-bar tensegrity robot.
Demonstrating Large-Scale Package Manipulation via Learned Metrics of Pick Success
Li, Shuai, Keipour, Azarakhsh, Jamieson, Kevin, Hudson, Nicolas, Swan, Charles, Bekris, Kostas
Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to workforce fluctuations. The past few years have seen increased interest in automating such repeated tasks but mostly in controlled settings. Tasks such as picking objects from unstructured, cluttered piles have only recently become robust enough for large-scale deployment with minimal human intervention. This paper demonstrates a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which utilizes a pick success predictor trained on real production data. Specifically, the system was trained on over 394K picks. It is used for singulating up to 5 million packages per day and has manipulated over 200 million packages during this paper's evaluation period. The developed learned pick quality measure ranks various pick alternatives in real-time and prioritizes the most promising ones for execution. The pick success predictor aims to estimate from prior experience the success probability of a desired pick by the deployed industrial robotic arms in cluttered scenes containing deformable and rigid objects with partially known properties. It is a shallow machine learning model, which allows us to evaluate which features are most important for the prediction. An online pick ranker leverages the learned success predictor to prioritize the most promising picks for the robotic arm, which are then assessed for collision avoidance. This learned ranking process is demonstrated to overcome the limitations and outperform the performance of manually engineered and heuristic alternatives. To the best of the authors' knowledge, this paper presents the first large-scale deployment of learned pick quality estimation methods in a real production system.
Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos
Lu, Shiyang, Deng, Yunfu, Boularias, Abdeslam, Bekris, Kostas
This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot. A key feature of the self-supervised training process is a graph-matching algorithm that operates on the over-segmentation output of the point cloud that is reconstructed from each video. The graph matching, along with point cloud registration, is able to find reoccurring object patterns across videos and combine them into 3D object pseudo labels, even under occlusions or different viewing angles. Projected 2D object masks from 3D pseudo labels are used to train a pixel-wise feature extractor through contrastive learning. During online inference, a clustering method uses the learned features to cluster foreground pixels into object segments. Experiments highlight the method's effectiveness on both real and synthetic video datasets, which include cluttered scenes of tabletop objects. The proposed method outperforms existing unsupervised methods for object segmentation by a large margin.
You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration
Wen, Bowen, Lian, Wenzhao, Bekris, Kostas, Schaal, Stefan
Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into long-range, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in high-precision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations.