AITopics | Huang, Haojie

Collaborating Authors

Huang, Haojie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Coarse-to-Fine 3D Keyframe Transporter

Zhu, Xupeng, Klee, David, Wang, Dian, Hu, Boce, Huang, Haojie, Tangri, Arsh, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceFeb-3-2025

Recent advances in Keyframe Imitation Learning (IL) have enabled learning-based agents to solve a diverse range of manipulation tasks. However, most approaches ignore the rich symmetries in the problem setting and, as a consequence, are sample-inefficient. This work identifies and utilizes the bi-equivariant symmetry within Keyframe IL to design a policy that generalizes to transformations of both the workspace and the objects grasped by the gripper. We make two main contributions: First, we analyze the bi-equivariance properties of the keyframe action scheme and propose a Keyframe Transporter derived from the Transporter Networks, which evaluates actions using cross-correlation between the features of the grasped object and the features of the scene. Second, we propose a computationally efficient coarse-to-fine SE(3) action evaluation scheme for reasoning the intertwined translation and rotation action. The resulting method outperforms strong Keyframe IL baselines by an average of >10% on a wide range of simulation tasks, and by an average of 55% in 4 physical experiments.

artificial intelligence, conference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.01773

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies

Huang, Haojie, Liu, Haotian, Wang, Dian, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceSep-23-2024

Many manipulation tasks require the robot to rearrange objects relative to one another. Such tasks can be described as a sequence of relative poses between parts of a set of rigid bodies. In this work, we propose MATCH POLICY, a simple but novel pipeline for solving high-precision pick and place tasks. Instead of predicting actions directly, our method registers the pick and place targets to the stored demonstrations. This transfers action inference into a point cloud registration task and enables us to realize nontrivial manipulation policies without any training. MATCH POLICY is designed to solve high-precision tasks with a key-frame setting. By leveraging the geometric interaction and the symmetries of the task, it achieves extremely high sample efficiency and generalizability to unseen configurations. We demonstrate its state-of-the-art performance across various tasks on RLBench benchmark compared with several strong baselines and test it on a real robot with six tasks.

artificial intelligence, conference, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2409.15517

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

Qian, Yaoyao, Zhu, Xupeng, Biza, Ondrej, Jiang, Shuo, Zhao, Linfeng, Huang, Haojie, Qi, Yu, Platt, Robert

arXiv.org Artificial IntelligenceJul-15-2024

The field of robotic grasping has seen significant advancements in recent years, with deep learning and vision-language models driving progress towards more intelligent and adaptable grasping systems [1, 2, 3]. However, robotic grasping in highly cluttered environments remains a major challenge, as target objects are often severely occluded or completely hidden [4, 5, 6]. Even stateof-the-art methods struggle to accurately identify and grasp objects in such scenarios. To address this challenge, we propose ThinkGrasp, which combines the strength of large-scale pretrained vision-language models with an occlusion handling system. ThinkGrasp leverages the advanced reasoning capabilities of models like GPT-4o [7] to gain a visual understanding of environmental and object properties such as sharpness and material composition. By integrating this knowledge through a structured prompt-based chain of thought, ThinkGrasp can significantly enhance success rates and ensure the safety of grasp poses by strategically eliminating obstructing objects. For instance, it prioritizes larger and centrally located objects to maximize visibility and access and focuses on grasping the safest and most advantageous parts, such as handles or flat surfaces. Unlike VL-Grasp[8], which relies on the RoboRefIt dataset for robotic perception and reasoning, ThinkGrasp benefits from GPT-4o's reasoning and generalization capabilities. This allows ThinkGrasp to intuitively select the right objects and achieve higher performance in complex environments, as demonstrated by our comparative experiments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.11298

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

Hu, Boce, Zhu, Xupeng, Wang, Dian, Dong, Zihao, Huang, Haojie, Wang, Chenghao, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceJul-3-2024

While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our main contribution is to propose an $SE(3)$-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere $S^2$ using a spherical harmonic basis. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style backbone to enlarge the number of points the model can handle. Our resulting method, which we name $\textit{OrbitGrasp}$, significantly outperforms baselines in both simulation and physical experiments.

artificial intelligence, machine learning, point cloud, (13 more...)

arXiv.org Artificial Intelligence

2407.03531

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Equivariant Diffusion Policy

Wang, Dian, Hart, Stephen, Surovik, David, Kelestemur, Tarik, Huang, Haojie, Zhao, Haibo, Yeatman, Mark, Wang, Jiuguang, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceJul-1-2024

Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning method that leverages domain symmetries to obtain better sample efficiency and generalization in the denoising function. We theoretically analyze the $\mathrm{SO}(2)$ symmetry of full 6-DoF control and characterize when a diffusion model is $\mathrm{SO}(2)$-equivariant. We furthermore evaluate the method empirically on a set of 12 simulation tasks in MimicGen, and show that it obtains a success rate that is, on average, 21.9% higher than the baseline Diffusion Policy. We also evaluate the method on a real-world system to show that effective policies can be learned with relatively few training samples, whereas the baseline Diffusion Policy cannot.

artificial intelligence, diffusion policy, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2407.01812

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Open-vocabulary Pick and Place via Patch-level Semantic Maps

Jia, Mingxi, Huang, Haojie, Zhang, Zhewen, Wang, Chenghao, Zhao, Linfeng, Wang, Dian, Liu, Jason Xinyu, Walters, Robin, Platt, Robert, Tellex, Stefanie

arXiv.org Artificial IntelligenceJun-21-2024

Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and struggle with generalization. This paper introduces Grounded Equivariant Manipulation (GEM), a novel approach that leverages the generative capabilities of pre-trained vision-language models and geometric symmetries to facilitate few-shot and zero-shot learning for open-vocabulary robot manipulation tasks. Our experiments demonstrate GEM's high sample efficiency and superior generalization across diverse pick-and-place tasks in both simulation and real-world experiments, showcasing its ability to adapt to novel instructions and unseen objects with minimal data requirements. GEM advances a significant step forward in the domain of language-conditioned robot control, bridging the gap between semantic understanding and action generation in robotic systems.

large language model, natural language, semantic map, (18 more...)

arXiv.org Artificial Intelligence

2406.15677

Country: North America > United States > New York (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.61)

Add feedback

Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies

Huang, Haojie, Schmeckpeper, Karl, Wang, Dian, Biza, Ondrej, Qian, Yaoyao, Liu, Haotian, Jia, Mingxi, Platt, Robert, Walters, Robin

arXiv.org Artificial IntelligenceJun-17-2024

Humans can imagine goal states during planning and perform actions to match those goals. In this work, we propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation. This transforms action inference into a local generative task. We leverage pick and place symmetries underlying the tasks in the generation process and achieve extremely high sample efficiency and generalizability to unseen configurations. Finally, we demonstrate state-of-the-art performance across various tasks on the RLbench benchmark compared with several strong baselines.

artificial intelligence, machine learning, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2406.1174

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fourier Transporter: Bi-Equivariant Robotic Manipulation in 3D

Huang, Haojie, Howell, Owen, Zhu, Xupeng, Wang, Dian, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceJan-22-2024

Many complex robotic manipulation tasks can be decomposed as a sequence of pick and place actions. Training a robotic agent to learn this sequence over many different starting conditions typically requires many iterations or demonstrations, especially in 3D environments. In this work, we propose Fourier Transporter (\ours{}) which leverages the two-fold $\SE(d)\times\SE(d)$ symmetry in the pick-place problem to achieve much higher sample efficiency. \ours{} is an open-loop behavior cloning method trained using expert demonstrations to predict pick-place actions on new environments. \ours{} is constrained to incorporate symmetries of the pick and place actions independently. Our method utilizes a fiber space Fourier transformation that allows for memory-efficient construction. We test our proposed network on the RLbench benchmark and achieve state-of-the-art results across various tasks.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2401.12046

Country:

North America > United States > New York (0.14)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Leveraging Symmetries in Pick and Place

Huang, Haojie, Wang, Dian, Tangri, Arsh, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceDec-22-2023

Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.

artificial intelligence, orientation, symmetry, (16 more...)

arXiv.org Artificial Intelligence

2308.07948

Country:

North America > United States > New York (0.14)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (1.00)

Add feedback

Edge Grasp Network: A Graph-Based SE(3)-invariant Approach to Grasp Detection

Huang, Haojie, Wang, Dian, Zhu, Xupeng, Walters, Robin, Platt, Robert

arXiv.org Artificial IntelligenceOct-31-2022

Given point cloud input, the problem of 6-DoF grasp pose detection is to identify a set of hand poses in SE(3) from which an object can be successfully grasped. This important problem has many practical applications. Here we propose a novel method and neural network model that enables better grasp success rates relative to what is available in the literature. The method takes standard point cloud data as input and works well with single-view point clouds observed from arbitrary viewing directions.

artificial intelligence, machine learning, point cloud, (16 more...)

arXiv.org Artificial Intelligence

2211.00191

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback