AITopics | Wen, Chuan

Collaborating Authors

Wen, Chuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FP3: A 3D Foundation Policy for Robotic Manipulation

Yang, Rujia, Chen, Geng, Wen, Chuan, Gao, Yang

arXiv.org Artificial IntelligenceMar-11-2025

FP3 supports data-efficient fine-tuning for downstream tasks, while demonstrating superior generalizability to unseen environments and novel objects. Abstract --Following its success in natural language processing and computer vision, foundation models that are pre-trained on large-scale multi-task datasets have also shown great potential in robotics. However, most existing robot foundation models rely solely on 2D image observations, ignoring 3D geometric information, which is essential for robots to perceive and reason about the 3D world. In this paper, we introduce FP3, a first denotes equal contribution. FP3 builds on a scalable diffusion transformer architecture and is pre-trained on 60k trajectories with point cloud observations. With the model design and diverse pre-training data, FP3 can be efficiently fine-tuned for downstream tasks while exhibiting strong generalization capabilities. Experiments on real robots demonstrate that with only 80 demonstrations, FP3 is able to learn a new task with over 90% success rates in novel environments with unseen objects, significantly surpassing existing robot foundation models. Visualizations and code are available at: FP3. I NTRODUCTION Learning-based policies have shown great effectiveness in robotic manipulation [6, 80, 12, 75, 36, 3]. However, these learned policies often show limited or even zero generalization capability to unseen scenarios, new objects, and distractors [66]. Additionally, most current methods are trained on single or few tasks[12, 75], requiring a relatively large amount of expert demonstrations (usually about 200 episodes) to learn a new task.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.0895

Country:

Asia > China (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Predictive Inference With Fast Feature Conformal Prediction

Tang, Zihao, Wang, Boyuan, Wen, Chuan, Teng, Jiaye

arXiv.org Machine LearningNov-30-2024

Conformal prediction is widely adopted in uncertainty quantification, due to its post-hoc, distribution-free, and model-agnostic properties. In the realm of modern deep learning, researchers have proposed Feature Conformal Prediction (FCP), which deploys conformal prediction in a feature space, yielding reduced band lengths. However, the practical utility of FCP is limited due to the time-consuming non-linear operations required to transform confidence bands from feature space to output space. In this paper, we introduce Fast Feature Conformal Prediction (FFCP), which features a novel non-conformity score and is convenient for practical applications. FFCP serves as a fast version of FCP, in that it equivalently employs a Taylor expansion to approximate the aforementioned non-linear operations in FCP. Empirical validations showcase that FFCP performs comparably with FCP (both outperforming the vanilla version) while achieving a significant reduction in computational time by approximately 50x. The code is available at https://github.com/ElvisWang1111/FastFeatureCP

artificial intelligence, ffcp, machine learning, (16 more...)

arXiv.org Machine Learning

2412.00653

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Scaling Laws in Imitation Learning for Robotic Manipulation

Lin, Fanqi, Hu, Yingdong, Sheng, Pingyue, Wen, Chuan, You, Jiacheng, Gao, Yang

arXiv.org Artificial IntelligenceOct-24-2024

Data scaling has revolutionized fields like natural language processing and computer vision, providing models with remarkable generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment. To this end, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy's generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. The diversity of environments and objects is far more important than the absolute number of demonstrations; once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect. Based on these insights, we propose an efficient data collection strategy. With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90% success rates in novel environments with unseen objects.

demonstration, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.18647

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Can Transformers Capture Spatial Relations between Objects?

Wen, Chuan, Jayaraman, Dinesh, Gao, Yang

arXiv.org Artificial IntelligenceMar-1-2024

Spatial relationships between objects represent key scene information for humans to understand and interact with the world. To study the capability of current computer vision systems to recognize physically grounded spatial relations, we start by proposing precise relation definitions that permit consistently annotating a benchmark dataset. Despite the apparent simplicity of this task relative to others in the recognition literature, we observe that existing approaches perform poorly on this benchmark. We propose new approaches exploiting the long-range attention capabilities of transformers for this task, and evaluating key design principles. We identify a simple "RelatiViT" architecture and demonstrate that it outperforms all current approaches. To our knowledge, this is the first method to convincingly outperform naive baselines on spatial relation prediction in in-the-wild settings. The code and datasets are available in \url{https://sites.google.com/view/spatial-relation}.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.00729

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Imitation Learning from Observation with Automatic Discount Scheduling

Liu, Yuyang, Dong, Weijun, Hu, Yingdong, Wen, Chuan, Yin, Zhao-Heng, Zhang, Chongjie, Gao, Yang

arXiv.org Artificial IntelligenceFeb-7-2024

Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2310.07433

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

General Flow as Foundation Affordance for Scalable Robot Learning

Yuan, Chengbo, Wen, Chuan, Zhang, Tong, Gao, Yang

arXiv.org Artificial IntelligenceJan-21-2024

Figure 1: We propose General Flow as Foundation Affordance. Its properties and applications are analyzed to reveal its great power. We design a scale-aware algorithm for general flow prediction and achieve stable zero-shot cross-embodiment skill transfer in the real world. Abstract--We address the challenge of acquiring real-world guidance, thus facilitating stable zero-shot skill transfer in realworld manipulation skills with a scalable framework. We deploy our method with a policy based on the success of large-scale auto-regressive prediction in Large closed-loop flow prediction. Remarkably, without any additional Language Models (LLMs), we hold the belief that identifying training, our method achieves an impressive 81% success rate an appropriate prediction target capable of leveraging largescale in human-to-robot skill transfer, covering 18 tasks in 6 scenes. Therefore, we propose to utilize flow, which represents leveraging cross-embodiment data resources; (2) universality: the future trajectories of 3D points on objects of interest, as an multiple object categories, including rigid, articulated, and soft ideal prediction target in robot learning. To exploit scalable data bodies; (3) stable skill transfer: providing actionable guidance resources, we turn our attention to cross-embodiment datasets. These lead to a new pathway We develop, for the first time, a language-conditioned prediction towards scalable general robot learning. We first develop pipelines to extract 3D flow labels We aim to reveal a potential pathway for replicating the directly from RGBD human video datasets. We find prediction success of Large Language Models (LLMs) in the domain of of dense flow in real-world scene point clouds remains robot learning. Specifically, we are interested in developing a formidable challenge, primarily due to the variability of a new framework that enables scalable learning for robot trajectory scales and the need to enhance robustness in zeroshot manipulation. To address these issues, we employ scale-aware the future, this framework has the potential to progressively strategies in both the data and model aspects, complemented enhance the capabilities of robots, i.e., the scaling law that has by augmentation techniques that focus on embodiment occlusion been observed in LLMs [82]. Inspired by the LLMs training (human hand and robot arm) and query point sampling paradigm [14], we believe that two key elements contribute to (3D points on objects of interest), thereby boosting zero-shot their strong generalization abilities: (1) a vast training dataset stability.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2401.11439

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Any-point Trajectory Modeling for Policy Learning

Wen, Chuan, Lin, Xingyu, So, John, Chen, Kai, Dou, Qi, Gao, Yang, Abbeel, Pieter

arXiv.org Artificial IntelligenceDec-28-2023

Learning from demonstration is a powerful method for teaching robots new skills, and more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Our method's effectiveness is demonstrated across 130 simulation tasks, focusing on language-conditioned manipulation tasks. Visualizations and code are available at: \url{https://xingyu-lin.github.io/atm}.

artificial intelligence, machine learning, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2401.00025

Country:

Asia > China (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Predictive Inference with Feature Conformal Prediction

Teng, Jiaye, Wen, Chuan, Zhang, Dinghuai, Bengio, Yoshua, Gao, Yang, Yuan, Yang

arXiv.org Artificial IntelligenceApr-8-2023

Conformal prediction is a distribution-free technique for establishing valid prediction intervals. Although conventionally people conduct conformal prediction in the output space, this is not the only possibility. In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation learning. From a theoretical perspective, we demonstrate that feature conformal prediction provably outperforms regular conformal prediction under mild assumptions. Our approach could be combined with not only vanilla conformal prediction, but also other adaptive conformal prediction methods. Apart from experiments on existing predictive inference benchmarks, we also demonstrate the state-of-the-art performance of the proposed methods on large-scale tasks such as ImageNet classification and Cityscapes image segmentation.The code is available at \url{https://github.com/AlvinWen428/FeatureCP}.

artificial intelligence, feature cp, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.00173

Country:

Asia (0.67)
Europe (0.67)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Data Science (0.93)

Add feedback