tabletop
Query-Centric Diffusion Policy for Generalizable Robotic Assembly
Xu, Ziyi, Lin, Haohong, Liu, Shiqi, Zhao, Ding
The robotic assembly task poses a key challenge in building generalist robots due to the intrinsic complexity of part interactions and the sensitivity to noise perturbations in contact-rich settings. The assembly agent is typically designed in a hierarchical manner: high-level multi-part reasoning and low-level precise control. However, implementing such a hierarchical policy is challenging in practice due to the mismatch between high-level skill queries and low-level execution. To address this, we propose the Query-centric Diffusion Policy (QDP), a hierarchical framework that bridges high-level planning and low-level control by utilizing queries comprising objects, contact points, and skill information. QDP introduces a query-centric mechanism that identifies task-relevant components and uses them to guide low-level policies, leveraging point cloud observations to improve the policy's robustness. We conduct comprehensive experiments on the FurnitureBench in both simulation and real-world settings, demonstrating improved performance in skill precision and long-horizon success rate. In the challenging insertion and screwing tasks, QDP improves the skill-wise success rate by over 50% compared to baselines without structured queries.
corobos: A Design for Mobile Robots Enabling Cooperative Transitions between Table and Wall Surfaces
Han, Changyo, Nakagawa, Yosuke, Naemura, Takeshi
Swarm User Interfaces allow dynamic arrangement of user environments through the use of multiple mobile robots, but their operational range is typically confined to a single plane due to constraints imposed by their two-wheel propulsion systems. We present corobos, a proof-of-concept design that enables these robots to cooperatively transition between table (horizontal) and wall (vertical) surfaces seamlessly, without human intervention. Each robot is equipped with a uniquely designed slope structure that facilitates smooth rotation when another robot pushes it toward a target surface. Notably, this design relies solely on passive mechanical elements, eliminating the need for additional active electrical components. We investigated the design parameters of this structure and evaluated its transition success rate through experiments. Furthermore, we demonstrate various application examples to showcase the potential of corobos in enhancing user environments.
- North America > United States > New York > New York County > New York City (0.16)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Ireland (0.14)
- Europe > Denmark (0.14)
Tidiness Score-Guided Monte Carlo Tree Search for Visual Tabletop Rearrangement
Kee, Hogun, Oh, Wooseok, Kang, Minjae, Ahn, Hyemin, Oh, Songhwai
-- In this paper, we present the tidiness score-guided Monte Carlo tree search (TSMCTS), a novel framework designed to address the tabletop tidying up problem using only an RGB-D camera. We address two major problems for tabletop tidying up problem: (1) the lack of public datasets and benchmarks, and (2) the difficulty of specifying the goal configuration of unseen objects. We address the former by presenting the tabletop tidying up (TTU) dataset, a structured dataset collected in simulation. Using this dataset, we train a vision-based discriminator capable of predicting the tidiness score. This discriminator can consistently evaluate the degree of tidiness across unseen configurations, including real-world scenes. Addressing the second problem, we employ Monte Carlo tree search (MCTS) to find tidying trajectories without specifying explicit goals. Instead of providing specific goals, we demonstrate that our MCTS-based planner can find diverse tidied configurations using the tidiness score as a guidance. Consequently, we propose TSMCTS, which integrates a tidiness discriminator with an MCTS-based tidying planner to find optimal tidied arrangements. TSMCTS has successfully demonstrated its capability across various environments, including coffee tables, dining tables, office desks, and bathrooms. In this paper, we address the tabletop tidying problem, where an embodied AI agent autonomously organizes objects on a table based on their composition. As depicted in Figure 1, tidying up involves rearranging objects by determining an appropriate configuration of given objects, without providing an explicit target configuration.
- North America > United States (0.29)
- Asia > South Korea > Ulsan > Ulsan (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Detection, Recognition and Pose Estimation of Tabletop Objects
Nirgude, Sanjuksha, DuCharme, Kevin, Madhusoodanan, Namrita
The problem of cleaning a messy table using Deep Neural Networks is a very interesting problem in both social and industrial robotics. This project focuses on the social application of this technology. A neural network model that is capable of detecting and recognizing common tabletop objects, such as a mug, mouse, or stapler is developed. The model also predicts the angle at which these objects are placed on a table,with respect to some reference. Assuming each object has a fixed intended position and orientation on the tabletop, the orientation of a particular object predicted by the deep learning model can be used to compute the transformation matrix to move the object from its initial position to the intended position. This can be fed to a pick and place robot to carry out the transfer.This paper talks about the deep learning approaches used in this project for object detection and orientation estimation.
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
Jiang, Yunfan, Wang, Chen, Zhang, Ruohan, Wu, Jiajun, Fei-Fei, Li
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy execution in the real world. The robots can then learn from humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. TRANSIC allows humans to augment simulation policies to overcome various unmodeled sim-to-real gaps holistically through intervention and online correction. Residual policies can be learned from human corrections and integrated with simulation policies for autonomous execution. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly. Through synergistic integration of policies learned in simulation and from humans, TRANSIC is effective as a holistic approach to addressing various, often coexisting sim-to-real gaps. It displays attractive properties such as scaling with human effort. Videos and code are available at https://transic-robot.github.io/
- North America > United States > New York > New York County > New York City (0.14)
- Asia > South Korea > Daegu > Daegu (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (10 more...)
- Leisure & Entertainment (0.46)
- Information Technology (0.46)
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment
Huang, Xiaoqian, Sanket, Kachole, Ayyad, Abdulla, Naeini, Fariborz Baghaei, Makris, Dimitrios, Zweiri, Yahya
Taking advantage of an event-based camera, the issues of motion blur, low dynamic range and low time sampling of standard cameras can all be addressed. However, there is a lack of event-based datasets dedicated to the benchmarking of segmentation algorithms, especially those that provide depth information which is critical for segmentation in occluded scenes. This paper proposes a new Event-based Segmentation Dataset (ESD), a high-quality 3D spatial and temporal dataset for object segmentation in an indoor cluttered environment. Our proposed dataset ESD comprises 145 sequences with 14,166 RGB frames that are manually annotated with instance masks. Overall 21.88 million and 20.80 million events from two event-based cameras in a stereo-graphic configuration are collected, respectively. To the best of our knowledge, this densely annotated and 3D spatial-temporal event-based segmentation benchmark of tabletop objects is the first of its kind. By releasing ESD, we expect to provide the community with a challenging segmentation benchmark with high quality. Please note: Abbreviations should be introduced at the first mention in the main text - no abbreviations lists or tables should be included. The structure of the main text is provided below.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
DoPose-6D dataset for object segmentation and 6D pose estimation
Gouda, Anas, Ghanem, Abraham, Reining, Christopher
Scene understanding is essential in determining how intelligent robotic grasping and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datasets that are big enough for training and merely use the available real datasets for evaluation. This encourages us to introduce a new dataset (called DoPose-6D). The dataset contains annotations for 6D Pose estimation, object segmentation, and multi-view annotations, which serve all the pre-mentioned techniques. The dataset contains two types of scenes bin picking and tabletop, with the primary motive for this dataset collection being bin picking. We illustrate the effect of this dataset in the context of unseen object segmentation and provide some insights on mixing synthetic and real data for the training. We train a Mask R-CNN model that is practical to be used in industry and robotic grasping applications. Finally, we show how our dataset boosted the performance of a Mask R-CNN model. Our DoPose-6D dataset, trained network models, pipeline code, and ROS driver are available online.
A Hierarchical Architecture for Human-Robot Cooperation Processes
Darvish, Kourosh, Simetti, Enrico, Mastrogiovanni, Fulvio, Casalino, Giuseppe
In this paper we propose FlexHRC+, a hierarchical human-robot cooperation architecture designed to provide collaborative robots with an extended degree of autonomy when supporting human operators in high-variability shop-floor tasks. The architecture encompasses three levels, namely for perception, representation, and action. Building up on previous work, here we focus on (i) an in-the-loop decision making process for the operations of collaborative robots coping with the variability of actions carried out by human operators, and (ii) the representation level, integrating a hierarchical AND/OR graph whose online behaviour is formally specified using First Order Logic. The architecture is accompanied by experiments including collaborative furniture assembly and object positioning tasks.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (24 more...)
MQA: Answering the Question via Robotic Manipulation
Deng, Yuhong, Zhang, Naifu, Guo, Di, Liu, Huaping, Sun, Fuchun, Pang, Chen, Pang, Jing
In this paper,we propose a novel task of Manipulation Question Answering(MQA),a class of Question Answering (QA) task, where the robot is required to find the answer to the question by actively interacting with the environment via manipulation. Considering the tabletop scenario, a heatmap of the scene is generated to facilitate the robot to have a semantic understanding of the scene and an imitation learning approach with semantic understanding metric is proposed to generate manipulation actions which guide the manipulator to explore the tabletop to find the answer to the question. Besides, a novel dataset which contains a variety of tabletop scenarios and corresponding question-answer pairs is established. Extensive experiments have been conducted to validate the effectiveness of the proposed framework.
This One-Armed Robot Is Super Manipulative (in a Good Way)
Give a man a fish, the old saying goes, and you feed him for a day--teach a man to fish, and you feed him for a lifetime. Same goes for robots, with the exception that robots feed exclusively on electricity. The problem is figuring out the best way to teach them. Typically, robots get fairly detailed coded instructions on how to manipulate a particular object. But give it a different kind of object and you'll blow its mind, because the machines aren't great yet at learning and applying their skills to things they've never seen before.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Games > Go (0.42)