AITopics | Fang, Xiaolin

Collaborating Authors

Fang, Xiaolin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Keypoint Abstraction using Large Models for Object-Relative Imitation Learning

Fang, Xiaolin, Huang, Bo-Ruei, Mao, Jiayuan, Shone, Jasmine, Tenenbaum, Joshua B., Lozano-Pérez, Tomás, Kaelbling, Leslie Pack

arXiv.org Artificial IntelligenceOct-30-2024

Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for capturing essential object features, and for establishing a reference frame in action prediction, enabling data-efficient learning of robot skills. However, their manual design nature and reliance on additional human labels limit their scalability. In this paper, we propose KALM, a framework that leverages large pre-trained vision-language models (LMs) to automatically generate task-relevant and cross-instance consistent keypoints. KALM distills robust and consistent keypoints across views and objects by generating proposals using LMs and verifies them against a small set of robot demonstration data. Based on the generated keypoints, we can train keypoint-conditioned policy models that predict actions in keypoint-centric frames, enabling robots to generalize effectively across varying object poses, camera views, and object instances with similar functional shapes. Our method demonstrates strong performance in the real world, adapting to different tasks and environments from only a handful of demonstrations while requiring no additional labels. Website: https://kalm-il.github.io/

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.23254

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability

Fang, Xiaolin, Garrett, Caelan Reed, Eppner, Clemens, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack, Fox, Dieter

arXiv.org Artificial IntelligenceOct-3-2023

Task and Motion Planning (TAMP) approaches are effective at planning long-horizon autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by leveraging deep generative modeling, specifically diffusion models, to learn constraints and samplers that capture these difficult-to-engineer aspects of the planning model. These learned samplers are composed and combined within a TAMP solver in order to find action parameter values jointly that satisfy the constraints along a plan. To tractably make predictions for unseen objects in the environment, we define these samplers on low-dimensional learned latent embeddings of changing object state. We evaluate our approach in an articulated object manipulation domain and show how the combination of classical TAMP, generative learning, and latent embeddings enables long-horizon constraint-based reasoning. We also apply the learned sampler in the real world. More details are available at https://sites.google.com/view/dimsam-tamp

artificial intelligence, constraint-based reasoning, task and motion planning, (4 more...)

arXiv.org Artificial Intelligence

2306.13196

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback

Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances

Curtis, Aidan, Fang, Xiaolin, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás, Garrett, Caelan Reed

arXiv.org Artificial IntelligenceAug-10-2021

Abstract-- We present a strategy for designing and building very general robot manipulation systems involving the integration of a general-purpose task-and-motion planner with engineered and learned perception modules that estimate properties and affordances of unknown objects. Such systems are closedloop policies that map from RGB images, depth images, and robot joint encoder measurements to robot joint position commands. We show that following this strategy a task-and-motion planner can be used to plan intelligent behaviors even in the absence of a priori knowledge regarding the set of manipulable objects, their geometries, and their affordances. We explore several different ways of implementing such perceptual modules for segmentation, property detection, shape estimation, and grasp generation. We show how these modules are integrated within the PDDLStream task and motion planning framework. The goal is for all perceivable objects to be on a blue target region. The robot first finds and executes a plan that picks and places the cracker box on the blue target region. Our objective is to design and build robot policies that can interact robustly and safely with large collections of objects that are only partially observable, where the objects have The operation of our system, called M0M (Manipulation never been seen before and where achieving the goal may with Zero Models), is illustrated in Figure 1. The goal is require many coordinated actions, as in putting away all the for all objects to be on a blue target region.

artificial intelligence, blue target region, target region, (17 more...)

arXiv.org Artificial Intelligence

2108.04145

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

Multimodal Conditional Learning with Fast Thinking Policy-like Model and Slow Thinking Planner-like Model

Xie, Jianwen, Zheng, Zilong, Fang, Xiaolin, Zhu, Song-Chun, Wu, Ying Nian

arXiv.org Machine LearningFeb-7-2019

This paper studies the supervised learning of the conditional distribution of a high-dimensional output given an input, where the output and input belong to two different modalities, e.g., the output is an image and the input is a sketch. We solve this problem by learning two models that bear similarities to those in reinforcement learning and optimal control. One model is policy-like. It generates the output directly by a non-linear transformation of the input and a noise vector. This amounts to fast thinking because the conditional generation is accomplished by direct sampling. The other model is planner-like. It learns an objective function in the form of a conditional energy function, so that the output can be generated by optimizing the objective function, or more rigorously by sampling from the conditional energy-based model. This amounts to slow thinking because the sampling process is accomplished by an iterative algorithm such as Langevin dynamics. We propose to learn the two models jointly, where the fast thinking policy-like model serves to initialize the sampling of the slow thinking planner-like model, and the planner-like model refines the initial output by an iterative algorithm. The planner-like model learns from the difference between the refined output and the observed output, while the policy-like model learns from how the planner-like model refines its initial output. We demonstrate the effectiveness of the proposed method on various image generation tasks.

artificial intelligence, neural network, policy-like model, (18 more...)

arXiv.org Machine Learning

1902.02812

Country: North America > United States > California (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Divergence Triangle for Joint Training of Generator Model, Energy-based Model, and Inference Model

Han, Tian, Nijkamp, Erik, Fang, Xiaolin, Hill, Mitch, Zhu, Song-Chun, Wu, Ying Nian

arXiv.org Machine LearningJan-31-2019

This paper proposes the divergence triangle as a framework for joint training of generator model, energy-based model and inference model. The divergence triangle is a compact and symmetric (anti-symmetric) objective function that seamlessly integrates variational learning, adversarial learning, wake-sleep algorithm, and contrastive divergence in a unified probabilistic formulation. This unification makes the processes of sampling, inference, energy evaluation readily available without the need for costly Markov chain Monte Carlo methods. Our experiments demonstrate that the divergence triangle is capable of learning (1) an energy-based model with well-formed energy landscape, (2) direct sampling in the form of a generator network, and (3) feed-forward inference that faithfully reconstructs observed as well as synthesized data. The divergence triangle is a robust training method that can learn from incomplete data.

deep learning, energy-based model, neural network, (15 more...)

arXiv.org Machine Learning

1812.10907

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Add feedback