AITopics | Ma, Yukai

Collaborating Authors

Ma, Yukai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking

Ma, Yukai, Wei, Tiantian, Zhong, Naiting, Mei, Jianbiao, Hu, Tao, Wen, Licheng, Yang, Xuemeng, Shi, Botian, Liu, Yong

arXiv.org Artificial IntelligenceJan-14-2025

While autonomous driving technology has made remarkable strides, data-driven approaches still struggle with complex scenarios due to their limited reasoning capabilities. Meanwhile, knowledge-driven autonomous driving systems have evolved considerably with the popularization of visual language models. In this paper, we propose LeapVAD, a novel method based on cognitive perception and dual-process thinking. Our approach implements a human-attentional mechanism to identify and focus on critical traffic elements that influence driving decisions. By characterizing these objects through comprehensive attributes - including appearance, motion patterns, and associated risks - LeapVAD achieves more effective environmental representation and streamlines the decision-making process. Furthermore, LeapVAD incorporates an innovative dual-process decision-making module miming the human-driving learning process. The system consists of an Analytic Process (System-II) that accumulates driving experience through logical reasoning and a Heuristic Process (System-I) that refines this knowledge via fine-tuning and few-shot learning. LeapVAD also includes reflective mechanisms and a growing memory bank, enabling it to learn from past mistakes and continuously improve its performance in a closed-loop environment. To enhance efficiency, we develop a scene encoder network that generates compact scene representations for rapid retrieval of relevant driving experiences. Extensive evaluations conducted on two leading autonomous driving simulators, CARLA and DriveArena, demonstrate that LeapVAD achieves superior performance compared to camera-only approaches despite limited training data. Comprehensive ablation studies further emphasize its effectiveness in continuous learning and domain adaptation. Project page: https://pjlab-adg.github.io/LeapVAD/.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.08168

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking

Tang, Kai, Lang, Xiaolei, Ma, Yukai, Huang, Yuehao, Li, Laijian, Liu, Yong, Lv, Jiajun

arXiv.org Artificial IntelligenceSep-20-2024

Event cameras have garnered considerable attention due to their advantages over traditional cameras in low power consumption, high dynamic range, and no motion blur. This paper proposes a monocular event-inertial odometry incorporating an adaptive decay kernel-based time surface with polarity-aware tracking. We utilize an adaptive decay-based Time Surface to extract texture information from asynchronous events, which adapts to the dynamic characteristics of the event stream and enhances the representation of environmental textures. However, polarity-weighted time surfaces suffer from event polarity shifts during changes in motion direction. To mitigate its adverse effects on feature tracking, we optimize the feature tracking by incorporating an additional polarity-inverted time surface to enhance the robustness. Comparative analysis with visual-inertial and event-inertial odometry methods shows that our approach outperforms state-of-the-art techniques, with competitive results across various datasets.

artificial intelligence, machine learning, odometry, (15 more...)

arXiv.org Artificial Intelligence

2409.13971

Country:

Europe > Netherlands (0.14)
Europe > Austria (0.14)
Asia > China (0.14)

Genre: Research Report > Promising Solution (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

Mei, Jianbiao, Ma, Yukai, Yang, Xuemeng, Wen, Licheng, Cai, Xinyu, Li, Xin, Fu, Daocheng, Zhang, Bo, Cai, Pinlong, Dou, Min, Shi, Botian, He, Liang, Liu, Yong, Qiao, Yu

arXiv.org Artificial IntelligenceMay-24-2024

Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitive process. Specifically, LeapAD emulates human attention by selecting critical objects relevant to driving decisions, simplifying environmental interpretation, and mitigating decision-making complexities. Additionally, LeapAD incorporates an innovative dual-process decision-making module, which consists of an Analytic Process (System-II) for thorough analysis and reasoning, along with a Heuristic Process (System-I) for swift and empirical processing. The Analytic Process leverages its logical reasoning to accumulate linguistic driving experience, which is then transferred to the Heuristic Process by supervised fine-tuning. Through reflection mechanisms and a growing memory bank, LeapAD continuously improves itself from past mistakes in a closed-loop environment. Closed-loop testing in CARLA shows that LeapAD outperforms all methods relying solely on camera input, requiring 1-2 orders of magnitude less labeled data. Experiments also demonstrate that as the memory bank expands, the Heuristic Process with only 1.8B parameters can inherit the knowledge from a GPT-4 powered Analytic Process and achieve continuous performance improvement. Code will be released at https://github.com/PJLab-ADG/LeapAD.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.15324

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration

Xiang, Jingyang, Li, Siqi, Chen, Jun, Bai, Shipeng, Ma, Yukai, Dai, Guang, Liu, Yong

arXiv.org Artificial IntelligenceOct-9-2023

The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a \emph{Block Sparse Row} matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1$\times$N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel \emph{\textbf{S}oft \textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a uniform 1$\times$N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1$\times$N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at \url{https://github.com/JingyangXiang/SUBP}.

artificial intelligence, machine learning, pruning, (13 more...)

arXiv.org Artificial Intelligence

2310.06218

Country: Asia > China > Zhejiang Province (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Coco-LIC: Continuous-Time Tightly-Coupled LiDAR-Inertial-Camera Odometry using Non-Uniform B-spline

Lang, Xiaolei, Chen, Chao, Tang, Kai, Ma, Yukai, Lv, Jiajun, Liu, Yong, Zuo, Xingxing

arXiv.org Artificial IntelligenceSep-18-2023

In this paper, we propose an efficient continuous-time LiDAR-Inertial-Camera Odometry, utilizing non-uniform B-splines to tightly couple measurements from the LiDAR, IMU, and camera. In contrast to uniform B-spline-based continuous-time methods, our non-uniform B-spline approach offers significant advantages in terms of achieving real-time efficiency and high accuracy. This is accomplished by dynamically and adaptively placing control points, taking into account the varying dynamics of the motion. To enable efficient fusion of heterogeneous LiDAR-Inertial-Camera data within a short sliding-window optimization, we assign depth to visual pixels using corresponding map points from a global LiDAR map, and formulate frame-to-map reprojection factors for the associated pixels in the current image frame. This way circumvents the necessity for depth optimization of visual pixels, which typically entails a lengthy sliding window with numerous control points for continuous-time trajectory estimation. We conduct dedicated experiments on real-world datasets to demonstrate the advantage and efficacy of adopting non-uniform continuous-time trajectory representation. Our LiDAR-Inertial-Camera odometry system is also extensively evaluated on both challenging scenarios with sensor degenerations and large-scale scenarios, and has shown comparable or higher accuracy than the state-of-the-art methods. The codebase of this paper will also be open-sourced at https://github.com/APRIL-ZJU/Coco-LIC.

artificial intelligence, coco-lic, control point, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2023.3315542

2309.09808

Country:

Asia > China (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing (0.93)

Add feedback

OverlapNetVLAD: A Coarse-to-Fine Framework for LiDAR-based Place Recognition

Fu, Chencan, Li, Lin, Peng, Linpeng, Ma, Yukai, Zhao, Xiangrui, Liu, Yong

arXiv.org Artificial IntelligenceMar-13-2023

Place recognition is a challenging yet crucial task in robotics. Existing 3D LiDAR place recognition methods suffer from limited feature representation capability and long search times. To address these challenges, we propose a novel coarse-to-fine framework for 3D LiDAR place recognition that combines Birds' Eye View (BEV) feature extraction, coarse-grained matching, and fine-grained verification. In the coarse stage, our framework leverages the rich contextual information contained in BEV features to produce global descriptors. Then the top-\textit{K} most similar candidates are identified via descriptor matching, which is fast but coarse-grained. In the fine stage, our overlap estimation network reuses the corresponding BEV features to predict the overlap region, enabling meticulous and precise matching. Experimental results on the KITTI odometry benchmark demonstrate that our framework achieves leading performance compared to state-of-the-art methods. Our code is available at: \url{https://github.com/fcchit/OverlapNetVLAD}.

data mining, descriptor, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.06881

Country: Asia > China (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Data Science > Data Mining > Feature Extraction (0.35)

Add feedback

Ctrl-VIO: Continuous-Time Visual-Inertial Odometry for Rolling Shutter Cameras

Lang, Xiaolei, Lv, Jiajun, Huang, Jianxin, Ma, Yukai, Liu, Yong, Zuo, Xingxing

arXiv.org Artificial IntelligenceAug-25-2022

A wide range of sensors can be applied for accurate 6-In fact, a GS image corresponds to only one camera pose, DoF motion estimation, among which camera has become while every row of a RS image corresponds to one camera a good choice due to its low cost, light weight and intuitive pose, inevitably leading to a sharp increase in the dimension perception of the appearance information. While visual of the states to be estimated. Therefore, it is computationally odometry (VO) is able to estimate the up-to-scale camera intractable to merely estimate the poses of different rows of poses, it is prone to failure when facing challenges from RS images. A common way to cope with this problem is deficient texture, light variations and violent motion, etc. By to introduce a constant velocity model, assuming the camera additionally fusing Inertial Measurement Unit (IMU) data, moves at a constant speed between two keyframes [7-9]. Another visual-inertial odometry (VIO) can estimate camera poses way is to parameterize the continuous-time trajectory with absolute scale and becomes more robust against the by B-splines [10-13], which is a more elegant way compared aforementioned challenges compared to VO.

artificial intelligence, ctrl-vio, line delay, (17 more...)

arXiv.org Artificial Intelligence

2208.12008

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing (0.87)

Add feedback