AITopics | Xu, Danfei

Collaborating Authors

Xu, Danfei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zero-Shot Object Searching Using Large-scale Object Relationship Prior

Chen, Hongyi, Xu, Ruinian, Cheng, Shuo, Vela, Patricio A., Xu, Danfei

arXiv.org Artificial IntelligenceMar-10-2023

Home-assistant robots have been a long-standing research topic, and one of the biggest challenges is searching for required objects in housing environments. Previous object-goal navigation requires the robot to search for a target object category in an unexplored environment, which may not be suitable for home-assistant robots that typically have some level of semantic knowledge of the environment, such as the location of static furniture. In our approach, we leverage this knowledge and the fact that a target object may be located close to its related objects for efficient navigation. To achieve this, we train a graph neural network using the Visual Genome dataset to learn the object co-occurrence relationships and formulate the searching process as iteratively predicting the possible areas where the target object may be located. This approach is entirely zero-shot, meaning it doesn't require new accurate object correlation in the test environment. We empirically show that our method outperforms prior correlational object search algorithms. As our ultimate goal is to build fully autonomous assistant robots for everyday use, we further integrate the task planner for parsing natural language and generating task-completing plans with object navigation to execute human instructions. We demonstrate the effectiveness of our proposed pipeline in both the AI2-THOR simulator and a Stretch robot in a real-world environment.

machine learning, natural language, object-oriented architecture, (18 more...)

arXiv.org Artificial Intelligence

2303.06228

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Guided Conditional Diffusion for Controllable Traffic Simulation

Zhong, Ziyuan, Rempe, Davis, Xu, Danfei, Chen, Yuxiao, Veer, Sushant, Che, Tong, Ray, Baishakhi, Pavone, Marco

arXiv.org Artificial IntelligenceOct-31-2022

Controllable and realistic traffic simulation is critical for developing and verifying autonomous vehicles. Typical heuristic-based traffic models offer flexible control to make vehicles follow specific trajectories and traffic rules. On the other hand, data-driven approaches generate realistic and human-like behaviors, improving transfer from simulated to real-world traffic. However, to the best of our knowledge, no traffic model offers both controllability and realism. In this work, we develop a conditional diffusion model for controllable traffic generation (CTG) that allows users to control desired properties of trajectories at test time (e.g., reach a goal or follow a speed limit) while maintaining realism and physical feasibility through enforced dynamics. The key technical idea is to leverage recent advances from diffusion modeling and differentiable logic to guide generated trajectories to meet rules defined using signal temporal logic (STL). We further extend guidance to multi-agent settings and enable interaction-based rules like collision avoidance. CTG is extensively evaluated on the nuScenes dataset for diverse and composite rules, demonstrating improvement over strong baselines in terms of the controllability-realism tradeoff.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2210.17366

Country: Europe (0.28)

Genre: Research Report (0.50)

Industry: Transportation (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.87)

Add feedback

BITS: Bi-level Imitation for Traffic Simulation

Xu, Danfei, Chen, Yuxiao, Ivanovic, Boris, Pavone, Marco

arXiv.org Artificial IntelligenceAug-25-2022

Simulation is the key to scaling up validation and verification for robotic systems such as autonomous vehicles. Despite advances in high-fidelity physics and sensor simulation, a critical gap remains in simulating realistic behaviors of road users. This is because, unlike simulating physics and graphics, devising first principle models for human-like behaviors is generally infeasible. In this work, we take a data-driven approach and propose a method that can learn to generate traffic behaviors from real-world driving logs. The method achieves high sample efficiency and behavior diversity by exploiting the bi-level hierarchy of driving behaviors by decoupling the traffic simulation problem into high-level intent inference and low-level driving behavior imitation. The method also incorporates a planning module to obtain stable long-horizon behaviors. We empirically validate our method, named Bi-level Imitation for Traffic Simulation (BITS), with scenarios from two large-scale driving datasets and show that BITS achieves balanced traffic simulation performance in realism, diversity, and long-horizon stability. We also explore ways to evaluate behavior realism and introduce a suite of evaluation metrics for traffic simulation. Finally, as part of our core contributions, we develop and open source a software tool that unifies data formats across different driving datasets and converts scenes from existing datasets into interactive simulation environments. For additional information and videos, see https://sites.google.com/view/nvr-bits2022/home

artificial intelligence, machine learning, simulation, (15 more...)

arXiv.org Artificial Intelligence

2208.12403

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Wang, Chen, Pérez-D'Arpino, Claudia, Xu, Danfei, Fei-Fei, Li, Liu, C. Karen, Savarese, Silvio

arXiv.org Artificial IntelligenceAug-12-2021

We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator in-the-loop. Supplementary materials and videos at https://sites.google.com/view/co-gail-web/home

computer game, demonstration, neural network, (17 more...)

arXiv.org Artificial Intelligence

2108.06038

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.83)

Add feedback

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

Mandlekar, Ajay, Xu, Danfei, Wong, Josiah, Nasiriany, Soroush, Wang, Chen, Kulkarni, Rohun, Fei-Fei, Li, Savarese, Silvio, Zhu, Yuke, Martín-Martín, Roberto

arXiv.org Artificial IntelligenceAug-6-2021

Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. In this paper, we conduct an extensive study of six offline learning algorithms for robot manipulation on five simulated and three real-world multi-stage manipulation tasks of varying complexity, and with datasets of varying quality. Our study analyzes the most critical challenges when learning from offline human data for manipulation. Based on the study, we derive a series of lessons including the sensitivity to different algorithmic design choices, the dependence on the quality of the demonstrations, and the variability based on the stopping criteria due to the different objectives in training and evaluation. We also highlight opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods, and the ability to easily scale to natural, real-world manipulation scenarios where only raw sensory signals are available. We have open-sourced our datasets and all algorithm implementations to facilitate future research and fair comparisons in learning from human demonstration data. Codebase, datasets, trained models, and more available at https://arise-initiative.github.io/robomimic-web/

dataset, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2108.03298

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Military (0.67)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control

Wang, Chen, Wang, Rui, Xu, Danfei, Mandlekar, Ajay, Fei-Fei, Li, Savarese, Silvio

arXiv.org Artificial IntelligenceFeb-27-2021

Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data. However, IL methods often fail to generalize to new scene configurations not covered by training data. On the other hand, humans can manipulate objects in varying conditions. Key to such capability is hand-eye coordination, a cognitive ability that enables humans to adaptively direct their movements at task-relevant objects and be invariant to the objects' absolute spatial location. In this work, we present a learnable action space, Hand-eye Action Networks (HAN), that can approximate human's hand-eye coordination behaviors by learning from human teleoperated demonstrations. Through a set of challenging multi-stage manipulation tasks, we show that a visuomotor policy equipped with HAN is able to inherit the key spatial invariance property of hand-eye coordination and achieve zero-shot generalization to new scene configurations. Additional materials available at https://sites.google.com/stanford.edu/han

keypoint, neural network, neurology, (20 more...)

arXiv.org Artificial Intelligence

2103.00375

Country: North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Human-in-the-Loop Imitation Learning using Remote Teleoperation

Mandlekar, Ajay, Xu, Danfei, Martín-Martín, Roberto, Zhu, Yuke, Fei-Fei, Li, Savarese, Silvio

arXiv.org Artificial IntelligenceDec-12-2020

Imitation Learning is a promising paradigm for learning complex robot manipulation skills by reproducing behavior from human demonstrations. However, manipulation tasks often contain bottleneck regions that require a sequence of precise actions to make meaningful progress, such as a robot inserting a pod into a coffee machine to make coffee. Trained policies can fail in these regions because small deviations in actions can lead the policy into states not covered by the demonstrations. Intervention-based policy learning is an alternative that can address this issue -- it allows human operators to monitor trained policies and take over control when they encounter failures. In this paper, we build a data collection system tailored to 6-DoF manipulation settings, that enables remote human operators to monitor and intervene on trained policies. We develop a simple and effective algorithm to train the policy iteratively on new data collected by the system that encourages the policy to learn how to traverse bottlenecks through the interventions. We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators, and further show that our method outperforms multiple state-of-the-art baselines for learning from the human interventions on a challenging robot threading task and a coffee making task. Additional results and videos at https://sites.google.com/stanford.edu/iwr .

artificial intelligence, dataset, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2012.06733

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.24)
North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Regression Planning Networks

Xu, Danfei, Martín-Martín, Roberto, Huang, De-An, Zhu, Yuke, Savarese, Silvio, Fei-Fei, Li

arXiv.org Artificial IntelligenceSep-28-2019

Recent learning-to-plan methods have shown promising results on planning directly from observation space. Yet, their ability to plan for long-horizon tasks is limited by the accuracy of the prediction model. On the other hand, classical symbolic planners show remarkable capabilities in solving long-horizon tasks, but they require predefined symbolic rules and symbolic states, restricting their real-world applicability. In this work, we combine the benefits of these two paradigms and propose a learning-to-plan method that can directly generate a long-term symbolic plan conditioned on high-dimensional observations. We borrow the idea of regression (backward) planning from classical planning literature and introduce Regression Planning Networks (RPN), a neural network architecture that plans backward starting at a task goal and generates a sequence of intermediate goals that reaches the current observation. We show that our model not only inherits many favorable traits from symbolic planning, e.g., the ability to solve previously unseen tasks but also can learn from visual inputs in an end-to-end manner. We evaluate the capabilities of RPN in a grid world environment and a simulated 3D kitchen environment featuring complex visual scenes and long task horizons, and show that it achieves near-optimal performance in completely new task instances.

neural network, planning & scheduling, subgoal, (18 more...)

arXiv.org Artificial Intelligence

1909.13072

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

Huang, De-An, Xu, Danfei, Zhu, Yuke, Garg, Animesh, Savarese, Silvio, Fei-Fei, Li, Niebles, Juan Carlos

arXiv.org Artificial IntelligenceAug-16-2019

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning De-An Huang 1, Danfei Xu 1, Y uke Zhu 1, Animesh Garg 1, 2, Silvio Savarese 1, Li Fei-Fei 1, Juan Carlos Niebles 1 Abstract -- We address one-shot imitation learning, where the goal is to execute a previously unseen task based on a single demonstration. While there has been exciting progress in this direction, most of the approaches still require a few hundred tasks for meta-training, which limits the scalability of the approaches. Our main contribution is to formulate one-shot imitation learning as a symbolic planning problem along with the symbol grounding problem. This formulation disentangles the policy execution from the inter-task generalization and leads to better data efficiency. The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures. We address this challenge by proposing a continuous relaxation of the discrete symbolic planner that directly plans on the probabilistic outputs of the symbol grounding model. Our continuous relaxation of the planner can still leverage the information contained in the probabilistic symbol grounding and significantly improve over the baseline planner for the one-shot imitation learning tasks without using large training data. I NTRODUCTION We are interested in robots that can learn a wide variety of tasks efficiently. Recently, there has been an increasing interest in the one-shot imitation learning problem [1-7], where the goal is to learn to execute a previously unseen task from only a single demonstration of the task. This setting is also referred to as meta-learning [3, 8], where the meta-training stage uses a set of tasks in a given domain to simulate the one-shot testing scenario. This allows the learned model to generalize to previously unseen tasks with a single demonstration in the meta-testing stage. The main shortcoming of these one-shot approaches is that they typically require a large amount of data for meta-training (400 meta-training tasks in [4] and 1000 in [6] for the Block Stacking task [6]) to generalize reliably to unseen tasks.

continuous planner, neural network, planning & scheduling, (19 more...)

arXiv.org Artificial Intelligence

1908.06769

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.90)

Add feedback

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

Huang, De-An, Nair, Suraj, Xu, Danfei, Zhu, Yuke, Garg, Animesh, Fei-Fei, Li, Savarese, Silvio, Niebles, Juan Carlos

arXiv.org Artificial IntelligenceJul-10-2018

Our goal is for a robot to execute a previously unseen task based on a single video demonstration of the task. The success of our approach relies on the principle of transferring knowledge from seen tasks to unseen ones with similar semantics. More importantly, we hypothesize that to successfully execute a complex task from a single video demonstration, it is necessary to explicitly incorporate compositionality to the model. To test our hypothesis, we propose Neural Task Graph (NTG) Networks, which use task graph as the intermediate representation to modularize the representations of both the video demonstration and the derived policy. We show this formulation achieves strong inter-task generalization on two complex tasks: Block Stacking in BulletPhysics and Object Collection in AI2-THOR. We further show that the same principle is applicable to real-world videos. We show that NTG can improve data efficiency of few-shot activity understanding in the Breakfast Dataset.

artificial intelligence, demonstration, natural language, (17 more...)

arXiv.org Artificial Intelligence

1807.0348

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback