AITopics | Fan, Linxi

Plotting

Fan, Linxi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

Maddukuri, Abhiram, Jiang, Zhenyu, Chen, Lawrence Yunliang, Nasiriany, Soroush, Xie, Yuqi, Fang, Yu, Huang, Wenqi, Wang, Zu, Xu, Zhenjia, Chernyadev, Nikita, Reed, Scott, Goldberg, Ken, Mandlekar, Ajay, Fan, Linxi, Zhu, Yuke

arXiv.org Artificial IntelligenceApr-2-2025

Large real-world robot datasets hold great potential to train generalist robot models, but scaling real-world human data collection is time-consuming and resource-intensive. Simulation has great potential in supplementing large-scale data, especially with recent advances in generative AI and automated data generation tools that enable scalable creation of robot behavior datasets. However, training a policy solely in simulation and transferring it to the real world often demands substantial human effort to bridge the reality gap. A compelling alternative is to co-train the policy on a mixture of simulation and real-world datasets. Preliminary studies have recently shown this strategy to substantially improve the performance of a policy over one trained on a limited amount of real-world data. Nonetheless, the community lacks a systematic understanding of sim-and-real co-training and what it takes to reap the benefits of simulation data for real-robot learning. This work presents a simple yet effective recipe for utilizing simulation data to solve vision-based robotic manipulation tasks. We derive this recipe from comprehensive experiments that validate the co-training strategy on various simulation and real-world datasets. Using two domains--a robot arm and a humanoid--across diverse tasks, we demonstrate that simulation data can enhance real-world task performance by an average of 38%, even with notable differences between the simulation and real-world data. Videos and additional results can be found at https://co-training.github.io/

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.24361

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

Lin, Toru, Sachdev, Kartik, Fan, Linxi, Malik, Jitendra, Zhu, Yuke

arXiv.org Artificial IntelligenceFeb-27-2025

Reinforcement learning has delivered promising results in achieving human- or even superhuman-level capabilities across diverse problem domains, but success in dexterous robot manipulation remains limited. This work investigates the key challenges in applying reinforcement learning to solve a collection of contact-rich manipulation tasks on a humanoid embodiment. We introduce novel techniques to overcome the identified challenges with empirical validation. Our main contributions include an automated real-to-sim tuning module that brings the simulated environment closer to the real world, a generalized reward design scheme that simplifies reward engineering for long-horizon contact-rich manipulation tasks, a divide-and-conquer distillation process that improves the sample efficiency of hard-exploration problems while maintaining sim-to-real performance, and a mixture of sparse and dense object representations to bridge the sim-to-real perception gap. We show promising results on three humanoid dexterous manipulation tasks, with ablation studies on each technique. Our work presents a successful approach to learning humanoid dexterous manipulation using sim-to-real reinforcement learning, achieving robust generalization and high performance without the need for human demonstration.

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2502.20396

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning

Jiang, Zhenyu, Xie, Yuqi, Lin, Kevin, Xu, Zhenjia, Wan, Weikang, Mandlekar, Ajay, Fan, Linxi, Zhu, Yuke

arXiv.org Artificial IntelligenceOct-31-2024

Imitation learning from human demonstrations is an effective means to teach robots manipulation skills. But data acquisition is a major bottleneck in applying this paradigm more broadly, due to the amount of cost and human effort involved. There has been significant interest in imitation learning for bimanual dexterous robots, like humanoids. Unfortunately, data collection is even more challenging here due to the challenges of simultaneously controlling multiple arms and multi-fingered hands. Automated data generation in simulation is a compelling, scalable alternative to fuel this need for data. To this end, we introduce DexMimicGen, a large-scale automated data generation system that synthesizes trajectories from a handful of human demonstrations for humanoid robots with dexterous hands. We present a collection of simulation environments in the setting of bimanual dexterous manipulation, spanning a range of manipulation behaviors and different requirements for coordination among the two arms. We generate 21K demos across these tasks from just 60 source human demos and study the effect of several data generation and policy learning decisions on agent performance. Finally, we present a real-to-sim-to-real pipeline and deploy it on a real-world humanoid can sorting task. Videos and more are at https://dexmimicgen.github.io/

artificial intelligence, demonstration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.24185

Country: North America > United States > New York (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

Wang, Zhendong, Li, Zhaoshuo, Mandlekar, Ajay, Xu, Zhenjia, Fan, Jiaojiao, Narang, Yashraj, Fan, Linxi, Zhu, Yuke, Balaji, Yogesh, Zhou, Mingyuan, Liu, Ming-Yu, Zeng, Yu

arXiv.org Artificial IntelligenceOct-28-2024

Diffusion models, praised for their success in generative tasks, are increasingly being applied to robotics, demonstrating exceptional performance in behavior cloning. However, their slow generation process stemming from iterative denoising steps poses a challenge for real-time applications in resource-constrained robotics setups and dynamically changing environments. In this paper, we introduce the One-Step Diffusion Policy (OneDP), a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator, significantly accelerating response times for robotic control tasks. We ensure the distilled generator closely aligns with the original policy distribution by minimizing the Kullback-Leibler (KL) divergence along the diffusion chain, requiring only 2%- 10% additional pre-training cost for convergence. We evaluated OneDP on 6 challenging simulation tasks as well as 4 self-designed real-world tasks using the Franka robot. The results demonstrate that OneDP not only achieves state-of-theart success rates but also delivers an order-of-magnitude improvement in inference speed, boosting action prediction frequency from 1.5 Hz to 62 Hz, establishing its potential for dynamic and computationally constrained robotic applications. We share the project page here https://research.nvidia.com/labs/dir/onedp/. Recently, Chi et al. (2023); Team et al. (2024); Reuss et al. (2023); Ze et al. (2024); Ke et al. (2024); Prasad et al. (2024) demonstrated impressive results of diffusion models in imitation learning for robot control. In particular, Chi et al. (2023) introduces the diffusion policy and achieves a state-of-the-art imitation learning performance on a variety of robotics simulation and real-world tasks. However, because of the necessity of traversing the reverse diffusion chain, the slow generation process of diffusion models presents significant limitations for their application in robotic tasks. This process involves multiple iterations to pass through the same denoising network, potentially thousands of times (Song et al., 2020a; Wang et al., 2023). Such a long inference time restricts the practicality of using the diffusion policy (Chi et al., 2023), which by default runs at 1.49 Hz, in scenarios where quick response and low computational demands are essential.

artificial intelligence, arxiv preprint arxiv, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.21257

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots

He, Tairan, Xiao, Wenli, Lin, Toru, Luo, Zhengyi, Xu, Zhenjia, Jiang, Zhenyu, Kautz, Jan, Liu, Changliu, Shi, Guanya, Wang, Xiaolong, Fan, Linxi, Zhu, Yuke

arXiv.org Artificial IntelligenceOct-28-2024

Humanoid whole-body control requires adapting to diverse tasks such as navigation, loco-manipulation, and tabletop manipulation, each demanding a different mode of control. For example, navigation relies on root velocity tracking, while tabletop manipulation prioritizes upper-body joint angle tracking. Existing approaches typically train individual policies tailored to a specific command space, limiting their transferability across modes. We present the key insight that full-body kinematic motion imitation can serve as a common abstraction for all these tasks and provide general-purpose motor skills for learning multiple modes of whole-body control. Building on this, we propose HOVER (Humanoid Versatile Controller), a multi-mode policy distillation framework that consolidates diverse control modes into a unified policy. HOVER enables seamless transitions between control modes while preserving the distinct advantages of each, offering a robust and scalable solution for humanoid control across a wide range of modes. By eliminating the need for policy retraining for each control mode, our approach improves efficiency and flexibility for future humanoid applications.

artificial intelligence, hover, human computer interaction, (16 more...)

arXiv.org Artificial Intelligence

2410.21229

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.46)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.42)

Add feedback

ARDuP: Active Region Video Diffusion for Universal Policies

Huang, Shuaiyi, Levy, Mara, Jiang, Zhenyu, Anandkumar, Anima, Zhu, Yuke, Fan, Linxi, Huang, De-An, Shrivastava, Abhinav

arXiv.org Artificial IntelligenceJun-19-2024

Sequential decision-making can be formulated as a text-conditioned video generation problem, where a video planner, guided by a text-defined goal, generates future frames visualizing planned actions, from which control actions are subsequently derived. In this work, we introduce Active Region Video Diffusion for Universal Policies (ARDuP), a novel framework for video-based policy learning that emphasizes the generation of active regions, i.e. potential interaction areas, enhancing the conditional policy's focus on interactive areas critical for task execution. This innovative framework integrates active region conditioning with latent diffusion models for video planning and employs latent representations for direct action decoding during inverse dynamic modeling. By utilizing motion cues in videos for automatic active region discovery, our method eliminates the need for manual annotations of active regions. We validate ARDuP's efficacy via extensive experiments on simulator CLIPort and the real-world dataset BridgeData v2, achieving notable improvements in success rates and generating convincingly realistic video plans.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2406.13301

Country:

North America > United States > Texas (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

DrEureka: Language Model Guided Sim-To-Real Transfer

Ma, Yecheng Jason, Liang, William, Wang, Hung-Ju, Wang, Sam, Zhu, Yuke, Fan, Linxi, Bastani, Osbert, Jayaraman, Dinesh

arXiv.org Artificial IntelligenceJun-4-2024

Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach, DrEureka, requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate that our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.

dreureka, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.01967

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

Grigsby, Jake, Fan, Linxi, Zhu, Yuke

arXiv.org Artificial IntelligenceJan-31-2024

We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses sequence models to tackle the challenges of generalization, long-term memory, and meta-learning. Recent works have shown that off-policy learning can make in-context RL with recurrent policies viable. Nonetheless, these approaches require extensive tuning and limit scalability by creating key bottlenecks in agents' memory capacity, planning horizon, and model size. AMAGO revisits and redesigns the off-policy in-context approach to successfully train long-sequence Transformers over entire rollouts in parallel with end-to-end RL. Our agent is scalable and applicable to a wide range of problems, and we demonstrate its strong performance empirically in meta-RL and long-term memory domains. AMAGO's focus on sparse rewards and off-policy data also allows in-context learning to extend to goal-conditioned problems with challenging exploration. When combined with a multi-goal hindsight relabeling scheme, AMAGO can solve a previously difficult category of open-world domains, where agents complete many possible instructions in procedurally generated environments.

arxiv preprint arxiv, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2310.09971

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

Mandlekar, Ajay, Nasiriany, Soroush, Wen, Bowen, Akinola, Iretiayo, Narang, Yashraj, Fan, Linxi, Zhu, Yuke, Fox, Dieter

arXiv.org Artificial IntelligenceOct-26-2023

Imitation learning from a large set of human demonstrations has proved to be an effective paradigm for building capable robot agents. However, the demonstrations can be extremely costly and time-consuming to collect. We introduce MimicGen, a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations by adapting them to new contexts. We use MimicGen to generate over 50K demonstrations across 18 tasks with diverse scene configurations, object instances, and robot arms from just ~200 human demonstrations. We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks, such as multi-part assembly and coffee preparation, across broad initial state distributions. We further demonstrate that the effectiveness and utility of MimicGen data compare favorably to collecting additional human demonstrations, making it a powerful and economical approach towards scaling up robot learning. Datasets, simulation environments, videos, and more at https://mimicgen.github.io .

artificial intelligence, data generation system, scalable robot learning, (2 more...)

arXiv.org Artificial Intelligence

2310.17596

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Yang, Zhuolin, Ping, Wei, Liu, Zihan, Korthikanti, Vijay, Nie, Weili, Huang, De-An, Fan, Linxi, Yu, Zhiding, Lan, Shiyi, Li, Bo, Liu, Ming-Yu, Zhu, Yuke, Shoeybi, Mohammad, Catanzaro, Bryan, Xiao, Chaowei, Anandkumar, Anima

arXiv.org Artificial IntelligenceOct-22-2023

Augmenting pretrained language models (LMs) with a vision encoder (e.g., Flamingo) has obtained the state-of-the-art results in image-to-text generation. However, these models store all the knowledge within their parameters, thus often requiring enormous model parameters to model the abundant visual concepts and very rich textual descriptions. Additionally, they are inefficient in incorporating new data, requiring a computational-expensive fine-tuning process. In this work, we introduce a Retrieval-augmented Visual Language Model, Re-ViLM, built upon the Flamingo, that supports retrieving the relevant knowledge from the external database for zero and in-context few-shot image-to-text generations. By storing certain knowledge explicitly in the external database, our approach reduces the number of model parameters and can easily accommodate new data during evaluation by simply updating the database. We also construct an interleaved image and text data that facilitates in-context few-shot learning capabilities. We demonstrate that Re-ViLM significantly boosts performance for image-to-text generation tasks, especially for zero-shot and few-shot generation in out-of-domain settings with 4 times less parameters compared with baseline methods.

caption, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.04858

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback