AITopics | Gao, Bingzhao

Collaborating Authors

Gao, Bingzhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving

Wang, Yujin, Liu, Quanfeng, Jiang, Zhengxin, Wang, Tianyi, Jiao, Junfeng, Chu, Hongqing, Gao, Bingzhao, Chen, Hong

arXiv.org Artificial IntelligenceMar-17-2025

Accurately understanding and deciding high-level meta-actions is essential for ensuring reliable and safe autonomous driving systems. While vision-language models (VLMs) have shown significant potential in various autonomous driving tasks, they often suffer from limitations such as inadequate spatial perception and hallucination, reducing their effectiveness in complex autonomous driving scenarios. To address these challenges, we propose a retrieval-augmented decision-making (RAD) framework, a novel architecture designed to enhance VLMs' capabilities to reliably generate meta-actions in autonomous driving scenes. RAD leverages a retrieval-augmented generation (RAG) pipeline to dynamically improve decision accuracy through a three-stage process consisting of the embedding flow, retrieving flow, and generating flow. Additionally, we fine-tune VLMs on a specifically curated dataset derived from the NuScenes dataset to enhance their spatial perception and bird's-eye view image comprehension capabilities. Extensive experimental evaluations on the curated NuScenes-based dataset demonstrate that RAD outperforms baseline methods across key evaluation metrics, including match accuracy, and F1 score, and self-defined overall score, highlighting its effectiveness in improving meta-action decision-making for autonomous driving tasks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.13861

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Topology-Driven Trajectory Optimization for Modelling Controllable Interactions Within Multi-Vehicle Scenario

Ma, Changjia, Zhao, Yi, Gan, Zhongxue, Gao, Bingzhao, Ding, Wenchao

arXiv.org Artificial IntelligenceMar-7-2025

Abstract-- Trajectory optimization in multi-vehicle scenarios faces challenges due to its non-linear, non-convex properties and sensitivity to initial values, making interactions between vehicles difficult to control. In this paper, inspired by topological planning, we propose a differentiable local homotopy invariant metric to model the interactions. By incorporating this topological metric as a constraint into multi-vehicle trajectory optimization, our framework is capable of generating multiple interactive trajectories from the same initial values, achieving controllable interactions as well as supporting user-designed interaction patterns. Extensive experiments demonstrate its superior optimality and efficiency over existing methods. The colored curves are trajectories of each vehicle.

artificial intelligence, interaction, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2503.05471

Country: Asia > China (0.15)

Genre: Research Report (0.50)

Industry: Transportation (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.96)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.95)

Add feedback

RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models

Wang, Yujin, Liu, Quanfeng, Fan, Jiaqi, Hong, Jinlong, Chu, Hongqing, Tian, Mengjian, Gao, Bingzhao, Chen, Hong

arXiv.org Artificial IntelligenceDec-14-2024

Understanding and addressing corner cases is essential for ensuring the safety and reliability of autonomous driving systems. Vision-Language Models (VLMs) play a crucial role in enhancing scenario comprehension, yet they face significant challenges, such as hallucination and insufficient real-world grounding, which compromise their performance in critical driving scenarios. In this work, we propose RAC3, a novel framework designed to improve VLMs' ability to handle corner cases effectively. The framework integrates Retrieval-Augmented Generation (RAG) to mitigate hallucination by dynamically incorporating context-specific external knowledge. A cornerstone of RAC3 is its cross-modal alignment fine-tuning, which utilizes contrastive learning to embed image-text pairs into a unified semantic space, enabling robust retrieval of similar scenarios. We evaluate RAC3 through extensive experiments using a curated dataset of corner case scenarios, demonstrating its ability to enhance semantic alignment, improve hallucination mitigation, and achieve superior performance metrics, such as Cosine Similarity and ROUGE-L scores. For example, for the LLaVA-v1.6-34B VLM, the cosine similarity between the generated text and the reference text has increased by 5.22\%. The F1-score in ROUGE-L has increased by 39.91\%, the Precision has increased by 55.80\%, and the Recall has increased by 13.74\%. This work underscores the potential of retrieval-augmented VLMs to advance the robustness and safety of autonomous driving in complex environments.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.1105

Country:

Asia > China (0.29)
Europe > Switzerland (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (0.91)
Automobiles & Trucks (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Wu, Jianhua, Gao, Bingzhao, Gao, Jincheng, Yu, Jianhao, Chu, Hongqing, Yu, Qiankun, Gong, Xun, Chang, Yi, Tseng, H. Eric, Chen, Hong, Chen, Jie

arXiv.org Artificial IntelligenceMay-17-2024

With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.02288

Country:

Asia > China (0.14)
Europe > Netherlands (0.14)
North America > United States (0.14)
Europe > Germany (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Formal Synthesis of Controllers for Safety-Critical Autonomous Systems: Developments and Challenges

Yin, Xiang, Gao, Bingzhao, Yu, Xiao

arXiv.org Artificial IntelligenceFeb-20-2024

In recent years, formal methods have been extensively used in the design of autonomous systems. By employing mathematically rigorous techniques, formal methods can provide fully automated reasoning processes with provable safety guarantees for complex dynamic systems with intricate interactions between continuous dynamics and discrete logics. This paper provides a comprehensive review of formal controller synthesis techniques for safety-critical autonomous systems. Specifically, we categorize the formal control synthesis problem based on diverse system models, encompassing deterministic, non-deterministic, and stochastic, and various formal safety-critical specifications involving logic, real-time, and real-valued domains. The review covers fundamental formal control synthesis techniques, including abstraction-based approaches and abstraction-free methods. We explore the integration of data-driven synthesis approaches in formal control synthesis. Furthermore, we review formal techniques tailored for multi-agent systems (MAS), with a specific focus on various approaches to address the scalability challenges in large-scale systems. Finally, we discuss some recent trends and highlight research challenges in this area.

logic & formal reasoning, machine learning, specification, (18 more...)

arXiv.org Artificial Intelligence

2402.13075

Country:

Europe (0.67)
Asia > China (0.28)
North America > United States (0.28)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Overview (1.00)

Industry:

Transportation (1.00)
Energy (1.00)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback