AITopics | Kelestemur, Tarik

Collaborating Authors

Kelestemur, Tarik

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

Yang, Lujie, Suh, H. J. Terry, Zhao, Tong, Graesdal, Bernhard Paus, Kelestemur, Tarik, Wang, Jiuguang, Pang, Tao, Tedrake, Russ

arXiv.org Artificial IntelligenceFeb-27-2025

Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization Lujie Y ang 1, 2, H.J. Terry Suh 1, Tong Zhao 2, Bernhard Paus Græsdal 1, Tarik Kelestemur 2, Jiuguang Wang 2, Tao Pang 2, and Russ Tedrake 1 Abstract --We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent, contact-rich dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging long-horizon contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. I NTRODUCTION The emergence of foundation models has transformed fields such as natural language processing and computer vision, where models trained on massive, internet-scale datasets demonstrate remarkable generalization across diverse reasoning tasks [1, 2, 3, 4, 5]. Motivated by this success, the robotics community is currently pursuing foundation models for generalist robot policies capable of flexible and robust decision-making across a wide range of tasks [6, 7, 8], leading to significant industrial investments in large-scale robot learning [9]. However, the pursuit for generalist robot policies remains constrained by the limited availability of high-quality datasets, especially for contact-rich robotic manipulation. Existing datasets [7, 10, 11, 12] are orders of magnitude smaller than those used to train foundation models in other domains, such as Large Language Models (LLMs). To address data scarcity, robot learning researchers often rely on a spectrum of data sources varying in cost, quality, and transferability.

demonstration, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.20382

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

CuriousBot: Interactive Mobile Exploration via Actionable 3D Relational Object Graph

Wang, Yixuan, Fermoselle, Leonor, Kelestemur, Tarik, Wang, Jiuguang, Li, Yunzhu

arXiv.org Artificial IntelligenceJan-22-2025

Mobile exploration is a longstanding challenge in robotics, yet current methods primarily focus on active perception instead of active interaction, limiting the robot's ability to interact with and fully explore its environment. Existing robotic exploration approaches via active interaction are often restricted to tabletop scenes, neglecting the unique challenges posed by mobile exploration, such as large exploration spaces, complex action spaces, and diverse object relations. In this work, we introduce a 3D relational object graph that encodes diverse object relations and enables exploration through active interaction. We develop a system based on this representation and evaluate it across diverse scenes. Our qualitative and quantitative results demonstrate the system's effectiveness and generalization capabilities, outperforming methods that rely solely on vision-language models (VLMs).

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.13338

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

On-Robot Reinforcement Learning with Goal-Contrastive Rewards

Biza, Ondrej, Weng, Thomas, Sun, Lingfeng, Schmeckpeper, Karl, Kelestemur, Tarik, Ma, Yecheng Jason, Platt, Robert, van de Meent, Jan-Willem, Wong, Lawson L. S.

arXiv.org Artificial IntelligenceOct-25-2024

Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world. Unfortunately, RL can be prohibitively expensive, in terms of on-robot runtime, due to inefficient exploration when learning from a sparse reward signal. Designing dense reward functions is labour-intensive and requires domain expertise. In our work, we propose GCR (Goal-Contrastive Rewards), a dense reward function learning method that can be trained on passive video demonstrations. By using videos without actions, our method is easier to scale, as we can use arbitrary videos. GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories. We perform experiments in simulated manipulation environments across RoboMimic and MimicGen tasks, as well as in the real world using a Franka arm and a Spot quadruped. We find that GCR leads to a more-sample efficient RL, enabling model-free RL to solve about twice as many tasks as our baseline reward learning methods. We also demonstrate positive cross-embodiment transfer from videos of people and of other robots performing a task. Appendix: \url{https://tinyurl.com/gcr-appendix-2}.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2410.19989

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Massachusetts (0.28)
North America > Canada > Quebec (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

Wang, Yixuan, Yin, Guang, Huang, Binghao, Kelestemur, Tarik, Wang, Jiuguang, Li, Yunzhu

arXiv.org Artificial IntelligenceOct-22-2024

Diffusion-based policies have shown remarkable capability in executing complex robotic manipulation tasks but lack explicit characterization of geometry and semantics, which often limits their ability to generalize to unseen objects and layouts. To enhance the generalization capabilities of Diffusion Policy, we introduce a novel framework that incorporates explicit spatial and semantic information via 3D semantic fields. We generate 3D descriptor fields from multi-view RGBD observations with large foundational vision models, then compare these descriptor fields against reference descriptors to obtain semantic fields. The proposed method explicitly considers geometry and semantics, enabling strong generalization capabilities in tasks requiring category-level generalization, resolving geometric ambiguities, and attention to subtle geometric details. We evaluate our method across eight tasks involving articulated objects and instances with varying shapes and textures from multiple object categories. Our method demonstrates its effectiveness by increasing Diffusion Policy's average success rate on unseen instances from 20% to 93%. Additionally, we provide a detailed analysis and visualization to interpret the sources of performance gain and explain how our method can generalize to novel instances.

machine learning, natural language, semantic field, (16 more...)

arXiv.org Artificial Intelligence

2410.17488

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Equivariant Diffusion Policy

Wang, Dian, Hart, Stephen, Surovik, David, Kelestemur, Tarik, Huang, Haojie, Zhao, Haibo, Yeatman, Mark, Wang, Jiuguang, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceJul-1-2024

Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning method that leverages domain symmetries to obtain better sample efficiency and generalization in the denoising function. We theoretically analyze the $\mathrm{SO}(2)$ symmetry of full 6-DoF control and characterize when a diffusion model is $\mathrm{SO}(2)$-equivariant. We furthermore evaluate the method empirically on a set of 12 simulation tasks in MimicGen, and show that it obtains a success rate that is, on average, 21.9% higher than the baseline Diffusion Policy. We also evaluate the method on a real-world system to show that effective policies can be learned with relatively few training samples, whereas the baseline Diffusion Policy cannot.

artificial intelligence, diffusion policy, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2407.01812

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback