AITopics | pick and place

Collaborating Authors

pick and place

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ImMimic: Cross-Domain Imitation from Human Videos via Mapping and Interpolation

Liu, Yangcen, Shin, Woo Chul, Han, Yunhai, Chen, Zhenyang, Ravichandar, Harish, Xu, Danfei

arXiv.org Artificial IntelligenceSep-16-2025

Learning robot manipulation from abundant human videos offers a scalable alternative to costly robot-specific data collection. However, domain gaps across visual, morphological, and physical aspects hinder direct imitation. To effectively bridge the domain gap, we propose ImMimic, an embodiment-agnostic co-training framework that leverages both human videos and a small amount of teleoperated robot demonstrations. ImMimic uses Dynamic Time Warping (DTW) with either action- or visual-based mapping to map retargeted human hand poses to robot joints, followed by MixUp interpolation between paired human and robot trajectories. Our key insights are (1) retargeted human hand trajectories provide informative action labels, and (2) interpolation over the mapped data creates intermediate domains that facilitate smooth domain adaptation during co-training. Evaluations on four real-world manipulation tasks (Pick and Place, Push, Hammer, Flip) across four robotic embodiments (Robotiq, Fin Ray, Allegro, Ability) show that ImMimic improves task success rates and execution smoothness, highlighting its efficacy to bridge the domain gap for robust robot manipulation. The project website can be found at https://sites.google.com/view/immimic.

artificial intelligence, demonstration, robot demonstration, (16 more...)

arXiv.org Artificial Intelligence

2509.10952

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

ConceptBot: Enhancing Robot's Autonomy through Task Decomposition with Large Language Models and Knowledge Graph

Leanza, Alessandro, Moroncelli, Angelo, Vizzari, Giuseppe, Braghin, Francesco, Roveda, Loris, Spahiu, Blerina

arXiv.org Artificial IntelligenceSep-3-2025

--ConceptBot is a modular robotic planning framework that combines Large Language Models and Knowledge Graphs to generate feasible and risk-aware plans despite ambiguities in natural language instructions and correctly analyzing the objects present in the environment--challenges that typically arise from a lack of commonsense reasoning. T o do that, ConceptBot integrates (i) an Object Property Extraction (OPE) module that enriches scene understanding with semantic concepts from ConceptNet, (ii) a User Request Processing (URP) module that disambiguates and structures instructions, and (iii) a Planner that generates context-aware, feasible pick-and-place policies. In comparative evaluations against Google SayCan, ConceptBot achieved 100% success on explicit tasks, maintained 87% accuracy on implicit tasks (versus 31% for SayCan), reached 76% on risk-aware tasks (versus 15%), and outperformed SayCan in application-specific scenarios, including material classification (70% vs. 20%) and toxicity detection (86% vs. 36%). On SafeAgentBench, ConceptBot achieved an overall score of 80% (versus 46% for the next-best baseline). These results, validated in both simulation and laboratory experiments, demonstrate ConceptBot's ability to generalize without domain-specific training and to significantly improve the reliability of robotic policies in unstructured environments. Advances in recent decades in robotic core capabilities, i.e., perception, control, and manipulation, have increased demand for autonomous systems in fields ranging from manufacturing to healthcare, logistics to home care, etc. These capabilities are deeply interconnected with the planning phase [1], as successful planning depends on a robot's ability to perceive its environment accurately, execute precise control, and perform effective manipulation. Despite significant progress, planning in robotic systems continues to face challenges, particularly in unstructured environments [2]. A key element in achieving effective planning is task decomposition [3], which involves breaking complex objectives into smaller, manageable actions. This process is essential for simplifying execution and ensuring flexibility in diverse environments. Traditional task decomposition approaches, however, often rely on rigid, pre-programmed templates or static models, which struggle to adapt to unfamiliar or dynamic conditions [4]-[7]. Recently, advancements in Large Language Models (LLMs) have introduced a more dynamic alternative. LLMs enable robots to process natural language instructions, understand contextual nuances, and dynamically decompose tasks into actionable steps [8]-[10]. However, directly employing pre-trained LLMs often leads to non-executable or ineffective plans, as these models struggle to account for domain-specific constraints and real-world feasibility [11]- [13].

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.0057

Country: Europe > Italy (0.28)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry:

Materials > Containers & Packaging (0.93)
Health & Medicine (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

SimLauncher: Launching Sample-Efficient Real-world Robotic Reinforcement Learning via Simulation Pre-training

Wu, Mingdong, Wu, Lehong, Wu, Yizhuo, Huang, Weiyao, Fan, Hongwei, Hu, Zheyuan, Geng, Haoran, Li, Jinzhou, Ying, Jiahe, Yang, Long, Chen, Yuanpei, Dong, Hao

arXiv.org Artificial IntelligenceJul-8-2025

Autonomous learning of dexterous, long-horizon robotic skills has been a longstanding pursuit of embodied AI. Recent advances in robotic reinforcement learning (RL) have demonstrated remarkable performance and robustness in real-world visuomotor control tasks. However, applying RL in the real world faces challenges such as low sample efficiency, slow exploration, and significant reliance on human intervention. In contrast, simulators offer a safe and efficient environment for extensive exploration and data collection, while the visual sim-to-real gap, often a limiting factor, can be mitigated using real-to-sim techniques. Building on these, we propose SimLauncher, a novel framework that combines the strengths of real-world RL and real-to-sim-to-real approaches to overcome these challenges. Specifically, we first pre-train a visuomotor policy in the digital twin simulation environment, which then benefits real-world RL in two ways: (1) bootstrapping target values using extensive simulated demonstrations and real-world demonstrations derived from pre-trained policy rollouts, and (2) Incorporating action proposals from the pre-trained policy for better exploration. We conduct comprehensive experiments across multi-stage, contact-rich, and dexterous hand manipulation tasks. Compared to prior real-world RL approaches, SimLauncher significantly improves sample efficiency and achieves near-perfect success rates. We hope this work serves as a proof of concept and inspires further research on leveraging large-scale simulation pre-training to benefit real-world robotic RL.

arxiv preprint arxiv, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2507.04452

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Scalable, Training-Free Visual Language Robotics: A Modular Multi-Model Framework for Consumer-Grade GPUs

Samson, Marie, Muraccioli, Bastien, Kanehiro, Fumio

arXiv.org Artificial IntelligenceFeb-3-2025

The integration of language instructions with robotic control, particularly through Vision Language Action (VLA) models, has shown significant potential. However, these systems are often hindered by high computational costs, the need for extensive retraining, and limited scalability, making them less accessible for widespread use. In this paper, we introduce SVLR (Scalable Visual Language Robotics), an open-source, modular framework that operates without the need for retraining, providing a scalable solution for robotic control. SVLR leverages a combination of lightweight, open-source AI models including the Vision-Language Model (VLM) Mini-InternVL, zero-shot image segmentation model CLIPSeg, Large Language Model Phi-3, and sentence similarity model all-MiniLM to process visual and language inputs. These models work together to identify objects in an unknown environment, use them as parameters for task execution, and generate a sequence of actions in response to natural language instructions. A key strength of SVLR is its scalability. The framework allows for easy integration of new robotic tasks and robots by simply adding text descriptions and task definitions, without the need for retraining. This modularity ensures that SVLR can continuously adapt to the latest advancements in AI technologies and support a wide range of robots and tasks. SVLR operates effectively on an NVIDIA RTX 2070 (mobile) GPU, demonstrating promising performance in executing pick-and-place tasks. While these initial results are encouraging, further evaluation across a broader set of tasks and comparisons with existing VLA models are needed to assess SVLR's generalization capabilities and performance in more complex scenarios.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.01071

Country: Asia > Japan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

MissionGPT: Mission Planner for Mobile Robot based on Robotics Transformer Model

Berman, Vladimir, Bazhenov, Artem, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceNov-7-2024

This paper presents a novel approach to building mission planners based on neural networks with Transformer architecture and Large Language Models (LLMs). This approach demonstrates the possibility of setting a task for a mobile robot and its successful execution without the use of perception algorithms, based only on the data coming from the camera. In this work, a success rate of more than 50\% was obtained for one of the basic actions for mobile robots. The proposed approach is of practical importance in the field of warehouse logistics robots, as in the future it may allow to eliminate the use of markings, LiDARs, beacons and other tools for robot orientation in space. In conclusion, this approach can be scaled for any type of robot and for any number of robots.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.05107

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)
Overview > Innovation (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An Agile Large-Workspace Teleoperation Interface Based on Human Arm Motion and Force Estimation

Jia, Jianhang, Zhou, Hao, Zhang, Xin

arXiv.org Artificial IntelligenceOct-20-2024

Teleoperation can transfer human perception and cognition to a slave robot to cope with some complex tasks, in which the agility and flexibility of the interface play an important role in mapping human intention to the robot. In this paper, we developed an agile large-workspace teleoperation interface by estimating human arm behavior. Using the wearable sensor, namely the inertial measurement unit and surface electromyography armband, we can capture the human arm motion and force information, thereby intuitively controlling the manipulation of the robot. The control principle of our wearable interface includes two parts: (1) the arm incremental kinematics and (2) the grasping recognition. Moreover, we developed a teleoperation framework with a time synchronization mechanism for the real-time application. We conducted experimental comparisons with a versatile haptic device (Omega 7) to verify the effectiveness of our interface and framework. Seven subjects are invited to complete three different tasks: free motion, handover, and pick-and-place action (each task ten times), and the total number of tests is 420. Objectively, we used the task completion time and success rate to compare the performance of the two interfaces quantitatively. In addition, to quantify the operator experience, we used the NASA Task Load Index to assess their subjective feelings. The results showed that the proposed interface achieved a competitive performance with a better operating experience.

artificial intelligence, interface, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.15414

Country:

Asia > China > Liaoning Province > Shenyang (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(2 more...)

Genre: Research Report > New Finding (0.55)

Industry:

Government > Regional Government > North America Government > United States Government (0.36)
Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Nasiriany, Soroush, Maddukuri, Abhiram, Zhang, Lance, Parikh, Adeet, Lo, Aaron, Joshi, Abhishek, Mandlekar, Ajay, Zhu, Yuke

arXiv.org Artificial IntelligenceJun-4-2024

Recent advancements in Artificial Intelligence (AI) have largely been propelled by scaling. In Robotics, scaling is hindered by the lack of access to massive robot datasets. We advocate using realistic physical simulation as a means to scale environments, tasks, and datasets for robot learning methods. We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments. RoboCasa features realistic and diverse scenes focusing on kitchen environments. We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances. We enrich the realism and diversity of our simulation with generative AI tools, such as object assets from text-to-3D models and environment textures from text-to-image models. We design a set of 100 tasks for systematic evaluation, including composite tasks generated by the guidance of large language models. To facilitate learning, we provide high-quality human demonstrations and integrate automated trajectory generation methods to substantially enlarge our datasets with minimal human burden. Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning and show great promise in harnessing simulation data in real-world tasks. Videos and open-source code are available at https://robocasa.ai/

dataset, demonstration, simulation, (16 more...)

arXiv.org Artificial Intelligence

2406.02523

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.64)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

TidyBot: Personalized Robot Assistance with Large Language Models

Wu, Jimmy, Antonova, Rika, Kan, Adam, Lepert, Marion, Zeng, Andy, Song, Shuran, Bohg, Jeannette, Rusinkiewicz, Szymon, Funkhouser, Thomas

arXiv.org Artificial IntelligenceOct-11-2023

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models (LLMs) to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.

pick and place, receptacle, scenario, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10514-023-10139-z

2305.05658

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CRISP: Curriculum inducing Primitive Informed Subgoal Prediction

Singh, Utsav, Namboodiri, Vinay P

arXiv.org Artificial IntelligenceSep-23-2023

Hierarchical reinforcement learning is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we propose a novel hierarchical algorithm CRISP to generate a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. The lower level primitive periodically performs data relabeling on a handful of expert demonstrations using our primitive informed parsing approach to handle non-stationarity. Since our approach uses a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluations on complex robotic maze navigation and robotic manipulation environments show that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks. We perform real world robotic experiments on complex manipulation tasks and demonstrate that CRISP consistently outperforms the baselines.

expert demonstration, subgoal, visualization, (13 more...)

arXiv.org Artificial Intelligence

2304.03535

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sim-to-Real Deep Reinforcement Learning with Manipulators for Pick-and-place

Liu, Wenxing, Niu, Hanlin, Skilton, Robert, Carrasco, Joaquin

arXiv.org Artificial IntelligenceSep-17-2023

When transferring a Deep Reinforcement Learning model from simulation to the real world, the performance could be unsatisfactory since the simulation cannot imitate the real world well in many circumstances. This results in a long period of fine-tuning in the real world. This paper proposes a self-supervised vision-based DRL method that allows robots to pick and place objects effectively and efficiently when directly transferring a training model from simulation to the real world. A height-sensitive action policy is specially designed for the proposed method to deal with crowded and stacked objects in challenging environments. The training model with the proposed approach can be applied directly to a real suction task without any fine-tuning from the real world while maintaining a high suction success rate. It is also validated that our model can be deployed to suction novel objects in a real experiment with a suction success rate of 90\% without any real-world fine-tuning. The experimental video is available at: https://youtu.be/jSTC-EGsoFA.

height-sensitive action policy, real world, simulation, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-43360-3_20

2309.09247

Country:

Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Asia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback