AITopics | Hao, Peng

Collaborating Authors

Hao, Peng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TLA: Tactile-Language-Action Model for Contact-Rich Manipulation

Hao, Peng, Zhang, Chaofan, Li, Dingzhe, Cao, Xiaoge, Hao, Xiaoshuai, Cui, Shaowei, Wang, Shuo

arXiv.org Artificial IntelligenceMar-11-2025

Significant progress has been made in vision-language models. However, language-conditioned robotic manipulation for contact-rich tasks remains underexplored, particularly in terms of tactile sensing. To address this gap, we introduce the Tactile-Language-Action (TLA) model, which effectively processes sequential tactile feedback via cross-modal language grounding to enable robust policy generation in contact-intensive scenarios. In addition, we construct a comprehensive dataset that contains 24k pairs of tactile action instruction data, customized for fingertip peg-in-hole assembly, providing essential resources for TLA training and evaluation. Our results show that TLA significantly outperforms traditional imitation learning methods (e.g., diffusion policy) in terms of effective action generation and action accuracy, while demonstrating strong generalization capabilities by achieving over 85\% success rate on previously unseen assembly clearances and peg shapes. We publicly release all data and code in the hope of advancing research in language-conditioned tactile manipulation skill learning. Project website: https://sites.google.com/view/tactile-language-action/

artificial intelligence, machine learning, tla, (16 more...)

arXiv.org Artificial Intelligence

2503.08548

Country: Asia > China (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.49)

Add feedback

MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction

Hao, Xiaoshuai, Diao, Yunfeng, Wei, Mengchuan, Yang, Yifan, Hao, Peng, Yin, Rong, Zhang, Hui, Li, Weiming, Zhao, Shu, Liu, Yu

arXiv.org Artificial IntelligenceFeb-5-2025

Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches often neglect modality interaction and rely on simple fusion strategies, which suffer from the problems of misalignment and information loss. To address these issues, we propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction. Specifically, to solve the semantic misalignment problem between camera and LiDAR BEV features, we introduce the Cross-modal Interaction Transform (CIT) module, enabling interaction between two BEV feature spaces and enhancing feature representation through a self-attention mechanism. Additionally, we propose an effective Dual Dynamic Fusion (DDF) module to adaptively select valuable information from different modalities, which can take full advantage of the inherent information between different modalities. Moreover, MapFusion is designed to be simple and plug-and-play, easily integrated into existing pipelines. We evaluate MapFusion on two map construction tasks, including High-definition (HD) map and BEV map segmentation, to show its versatility and effectiveness. Compared with the state-of-the-art methods, MapFusion achieves 3.6% and 6.2% absolute improvements on the HD map construction and BEV map segmentation tasks on the nuScenes dataset, respectively, demonstrating the superiority of our approach.

artificial intelligence, information fusion, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2502.04377

Country: Asia > China (0.29)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (0.35)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.92)
Information Technology > Artificial Intelligence > Robots (0.88)

Add feedback

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Li, Dingzhe, Jin, Yixiang, A, Yong, Yu, Hongze, Shi, Jun, Hao, Xiaoshuai, Hao, Peng, Liu, Huaping, Sun, Fuchun, Fang, Bin

arXiv.org Artificial IntelligenceApr-28-2024

The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.18201

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback

RobotGPT: Robot Manipulation Learning from ChatGPT

Jin, Yixiang, Li, Dingzhe, A, Yong, Shi, Jun, Hao, Peng, Sun, Fuchun, Zhang, Jianwei, Fang, Bin

arXiv.org Artificial IntelligenceDec-3-2023

We present RobotGPT, an innovative decision framework for robotic manipulation that prioritizes stability and safety. The execution code generated by ChatGPT cannot guarantee the stability and safety of the system. ChatGPT may provide different answers for the same task, leading to unpredictability. This instability prevents the direct integration of ChatGPT into the robot manipulation loop. Although setting the temperature to 0 can generate more consistent outputs, it may cause ChatGPT to lose diversity and creativity. Our objective is to leverage ChatGPT's problem-solving capabilities in robot manipulation and train a reliable agent. The framework includes an effective prompt structure and a robust learning model. Additionally, we introduce a metric for measuring task difficulty to evaluate ChatGPT's performance in robot manipulation. Furthermore, we evaluate RobotGPT in both simulation and real-world environments. Compared to directly using ChatGPT to generate code, our framework significantly improves task success rates, with an average increase from 38.5% to 91.5%. Therefore, training a RobotGPT by utilizing ChatGPT as an expert is a more stable approach compared to directly using ChatGPT as a task planner.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2312.01421

Country:

Asia > China (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations

Xu, Tengfei, Liu, Dachuan, Hao, Peng, Wang, Bo

arXiv.org Artificial IntelligenceNov-9-2023

Neural operators as novel neural architectures for fast approximating solution operators of partial differential equations (PDEs), have shown considerable promise for future scientific computing. However, the mainstream of training neural operators is still data-driven, which needs an expensive ground-truth dataset from various sources (e.g., solving PDEs' samples with the conventional solvers, real-world experiments) in addition to training stage costs. From a computational perspective, marrying operator learning and specific domain knowledge to solve PDEs is an essential step in reducing dataset costs and label-free learning. We propose a novel paradigm that provides a unified framework of training neural operators and solving PDEs with the variational form, which we refer to as the variational operator learning (VOL). Ritz and Galerkin approach with finite element discretization are developed for VOL to achieve matrix-free approximation of system functional and residual, then direct minimization and iterative update are proposed as two optimization strategies for VOL. Various types of experiments based on reasonable benchmarks about variable heat source, Darcy flow, and variable stiffness elasticity are conducted to demonstrate the effectiveness of VOL. With a label-free training set and a 5-label-only shift set, VOL learns solution operators with its test errors decreasing in a power law with respect to the amount of unlabeled data. To the best of the authors' knowledge, this is the first study that integrates the perspectives of the weak form and efficient iterative methods for solving sparse linear systems into the end-to-end operator learning task.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Artificial Intelligence

2304.04234

Country:

North America > United States > Wisconsin (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback