AITopics | Chang, Haonan

Collaborating Authors

Chang, Haonan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Motion Blender Gaussian Splatting for Dynamic Reconstruction

Zhang, Xinyu, Chang, Haonan, Liu, Yuhan, Boularias, Abdeslam

arXiv.org Artificial IntelligenceMar-11-2025

Gaussian splatting has emerged as a powerful tool for high-fidelity reconstruction of dynamic scenes. However, existing methods primarily rely on implicit motion representations, such as encoding motions into neural networks or per-Gaussian parameters, which makes it difficult to further manipulate the reconstructed motions. This lack of explicit controllability limits existing methods to replaying recorded motions only, which hinders a wider application. To address this, we propose Motion Blender Gaussian Splatting (MB-GS), a novel framework that uses motion graph as an explicit and sparse motion representation. The motion of graph links is propagated to individual Gaussians via dual quaternion skinning, with learnable weight painting functions determining the influence of each link. The motion graphs and 3D Gaussians are jointly optimized from input videos via differentiable rendering. Experiments show that MB-GS achieves state-of-the-art performance on the iPhone dataset while being competitive on HyperNeRF. Additionally, we demonstrate the application potential of our method in generating novel object motions and robot demonstrations through motion editing. Video demonstrations can be found at https://mlzxy.github.io/mbgs.

artificial intelligence, gaussian, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.0904

Country:

Asia > Japan > Honshū > Chūbu (0.14)
Europe (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Autoregressive Action Sequence Learning for Robotic Manipulation

Zhang, Xinyu, Liu, Yuhan, Chang, Haonan, Schramm, Liam, Boularias, Abdeslam

arXiv.org Artificial IntelligenceNov-17-2024

Designing a universal policy architecture that performs well across diverse robots and task configurations remains a key challenge. In this work, we address this by representing robot actions as sequential data and generating actions through autoregressive sequence modeling. Existing autoregressive architectures generate end-effector waypoints sequentially as word tokens in language modeling, which are limited to low-frequency control tasks. Unlike language, robot actions are heterogeneous and often include continuous values -- such as joint positions, 2D pixel coordinates, and end-effector poses -- which are not easily suited for language-based modeling. Based on this insight, we introduce a straightforward enhancement: we extend causal transformers' single-token prediction to support predicting a variable number of tokens in a single step through our Chunking Causal Transformer (CCT). This enhancement enables robust performance across diverse tasks of various control frequencies, greater efficiency by having fewer autoregression steps, and lead to a hybrid action sequence design by mixing different types of actions and using a different chunk size for each action type. Based on CCT, we propose the Autoregressive Policy (ARP) architecture, which solves manipulation tasks by generating hybrid action sequences. We evaluate ARP across diverse robotic manipulation environments, including Push-T, ALOHA, and RLBench, and show that ARP, as a universal architecture, outperforms the environment-specific state-of-the-art in all tested benchmarks, while being more efficient in computation and parameter sizes. Videos of our real robot demonstrations, all source code and the pretrained models of ARP can be found at http://github.com/mlzxy/arp.

artificial intelligence, chunk size, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.03132

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)

Add feedback

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Yu, Qiaojun, Huang, Siyuan, Yuan, Xibin, Jiang, Zhengkai, Hao, Ce, Li, Xin, Chang, Haonan, Wang, Junbo, Liu, Liu, Li, Hongsheng, Gao, Peng, Lu, Cewu

arXiv.org Artificial IntelligenceSep-30-2024

Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general baseline for unified robotic manipulation tasks in the future. Images, videos, dataset, and code are published on the project website at:https://sites.google.com/view/uni-aff/home

affordance, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.20551

Country: Asia > China (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)

Add feedback

A3VLM: Actionable Articulation-Aware Vision Language Model

Huang, Siyuan, Chang, Haonan, Liu, Yuhan, Zhu, Yimeng, Dong, Hao, Gao, Peng, Boularias, Abdeslam, Li, Hongsheng

arXiv.org Artificial IntelligenceJun-13-2024

Vision Language Models (VLMs) have received significant attention in recent years in the robotics community. VLMs are shown to be able to perform complex visual reasoning and scene understanding tasks, which makes them regarded as a potential universal solution for general robotics problems such as manipulation and navigation. However, previous VLMs for robotics such as RT-1, RT-2, and ManipLLM have focused on directly learning robot-centric actions. Such approaches require collecting a significant amount of robot interaction data, which is extremely costly in the real world. Thus, we propose A3VLM, an object-centric, actionable, articulation-aware vision language model. A3VLM focuses on the articulation structure and action affordances of objects. Its representation is robot-agnostic and can be translated into robot actions using simple action primitives. Extensive experiments in both simulation benchmarks and real-world settings demonstrate the effectiveness and stability of A3VLM. We release our code and other materials at https://github.com/changhaonan/A3VLM.

large language model, natural language, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2406.07549

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)

Add feedback

Scaling Manipulation Learning with Visual Kinematic Chain Prediction

Zhang, Xinyu, Liu, Yuhan, Chang, Haonan, Boularias, Abdeslam

arXiv.org Artificial IntelligenceJun-11-2024

Learning general-purpose models from diverse datasets has achieved great success in machine learning. In robotics, however, existing methods in multi-task learning are typically constrained to a single robot and workspace, while recent work such as RT-X requires a non-trivial action normalization procedure to manually bridge the gap between different action spaces in diverse environments. In this paper, we propose the visual kinematics chain as a precise and universal representation of quasi-static actions for robot learning over diverse environments, which requires no manual adjustment since the visual kinematic chains can be automatically obtained from the robot's model and camera parameters. We propose the Visual Kinematics Transformer (VKT), a convolution-free architecture that supports an arbitrary number of camera viewpoints, and that is trained with a single objective of forecasting kinematic structures through optimal point-set matching. We demonstrate the superior performance of VKT over BC transformers as a general agent on Calvin, RLBench, Open-X, and real robot manipulation tasks. Video demonstrations can be found at https://mlzxy.github.io/visual-kinetic-chain.

artificial intelligence, kinematic chain, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2406.07837

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data

Lu, Shiyang, Chang, Haonan, Jing, Eric Pu, Boularias, Abdeslam, Bekris, Kostas

arXiv.org Artificial IntelligenceNov-6-2023

This work presents OVIR-3D, a straightforward yet effective method for open-vocabulary 3D object instance retrieval without using any 3D data for training. Given a language query, the proposed method is able to return a ranked set of 3D object instance segments based on the feature similarity of the instance and the text query. This is achieved by a multi-view fusion of text-aligned 2D region proposals into 3D space, where the 2D region proposal network could leverage 2D datasets, which are more accessible and typically larger than 3D datasets. The proposed fusion process is efficient as it can be performed in real-time for most indoor 3D scenes and does not require additional training in 3D space. Experiments on public datasets and a real robot show the effectiveness of the method and its potential for applications in robot navigation and manipulation.

artificial intelligence, category, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2311.02873

Country:

Asia (0.28)
Europe > Switzerland (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

Chang, Haonan, Gao, Kai, Boyalakuntla, Kowndinya, Lee, Alex, Huang, Baichuan, Kumar, Harish Udhaya, Yu, Jinjin, Boularias, Abdeslam

arXiv.org Artificial IntelligenceSep-27-2023

We introduce a novel approach to the executable semantic object rearrangement problem. In this challenge, a robot seeks to create an actionable plan that rearranges objects within a scene according to a pattern dictated by a natural language description. Unlike existing methods such as StructFormer and StructDiffusion, which tackle the issue in two steps by first generating poses and then leveraging a task planner for action plan formulation, our method concurrently addresses pose generation and action planning. We achieve this integration using a Language-Guided Monte-Carlo Tree Search (LGMCTS). Quantitative evaluations are provided on two simulation datasets, and complemented by qualitative tests with a real robot.

artificial intelligence, executable semantic object rearrangement, language-guided monte-carlo tree search, (1 more...)

arXiv.org Artificial Intelligence

2309.15821

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.60)

Add feedback

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

Chang, Haonan, Boyalakuntla, Kowndinya, Lu, Shiyang, Cai, Siwei, Jing, Eric, Keskar, Shreesh, Geng, Shijie, Abbas, Adeeb, Zhou, Lifeng, Bekris, Kostas, Boularias, Abdeslam

arXiv.org Artificial IntelligenceSep-27-2023

We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.

artificial intelligence, context-aware entity grounding, scene graph

arXiv.org Artificial Intelligence

2309.1594

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (0.53)

Add feedback

Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction

Chang, Haonan, Ramesh, Dhruv Metha, Geng, Shijie, Gan, Yuqiu, Boularias, Abdeslam

arXiv.org Artificial IntelligenceJan-30-2023

We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework. The proposed system solves a new optimization problem incorporating optical-flow-based 2D constraints to deal with fast motion and a novel semantic-aware deformation graph (SAD-graph) for handling topology change. We test the proposed system under various challenging scenes and demonstrate that it significantly outperforms existing state-of-the-art methods.

artificial intelligence, geometry, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2301.13244

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Scene-level Tracking and Reconstruction without Object Priors

Chang, Haonan, Boularias, Abdeslam

arXiv.org Artificial IntelligenceOct-7-2022

We present the first real-time system capable of tracking and reconstructing, individually, every visible object in a given scene, without any form of prior on the rigidness of the objects, texture existence, or object category. In contrast with previous methods such as Co-Fusion and MaskFusion that first segment the scene into individual objects and then process each object independently, the proposed method dynamically segments the non-rigid scene as part of the tracking and reconstruction process. When new measurements indicate topology change, reconstructed models are updated in real-time to reflect that change. Our proposed system can provide the live geometry and deformation of all visible objects in a novel scene in real-time, which makes it possible to be integrated seamlessly into numerous existing robotics applications that rely on object models for grasping and manipulation. The capabilities of the proposed system are demonstrated in challenging scenes that contain multiple rigid and non-rigid objects.

artificial intelligence, machine learning, real time system, (14 more...)

arXiv.org Artificial Intelligence

2210.03815

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback