AITopics | bev feature map

Collaborating Authors

bev feature map

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c49a28241640407b23bba8f2495f4bc9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 00:41:09 GMT

artificial intelligence, information, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

c49a28241640407b23bba8f2495f4bc9-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 15:56:51 GMT

crt-fusion, detection, information, (16 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

BEV-VLM: Trajectory Planning via Unified BEV Abstraction

Chen, Guancheng, Yang, Sheng, Zhan, Tong, Wang, Jian

arXiv.org Artificial IntelligenceOct-1-2025

ABSTRACT This paper introduces BEV -VLM, a novel framework for trajectory planning in autonomous driving that leverages Vision-Language Models (VLMs) with Bird's-Eye View (BEV) feature maps as visual inputs. Unlike conventional approaches that rely solely on raw visual data such as camera images, our method utilizes highly compressed and informative BEV representations, which are generated by fusing multi-modal sensor data (e.g., camera and LiDAR) and aligning them with HD Maps. This unified BEV -HD Map format provides a geometrically consistent and rich scene description, enabling VLMs to perform accurate trajectory planning. Experimental results on the nuScenes dataset demonstrate 44.8% improvements in planning accuracy and complete collision avoidance. Our work highlights that VLMs can effectively interpret processed visual representations like BEV features, expanding their applicability beyond raw images in trajectory planning. Index T erms-- Autonomous Driving, Vision-Language Model, Multi-Modal Learning 1. INTRODUCTION In recent years, the pursuit of advanced autonomous driving (AD) has attracted extensive attention, with Vision-Language Models (VLMs) emerging as a promising pathway, owing to their inherent cognitive capabilities from pre-training that enable effective application in real-world scenarios. While existing research has demonstrated the feasibility and reliability of using VLMs for path planning by feeding visual camera images, these approaches suffer from two key limitations: they rely solely on camera data and thus lack integration with other modalities, such as LiDAR point clouds, and they fail to explore VLMs' potential for planning based on Bird's-Eye View (BEV) features. To address these gaps, this work avoids the direct use of raw visual signals (e.g., camera images) as VLM inputs.

artificial intelligence, bev feature, information, (14 more...)

arXiv.org Artificial Intelligence

2509.25249

Country: Asia > China (0.14)

Genre: Research Report (0.70)

Industry: Transportation (0.99)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.99)

Add feedback

3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds Le Hui

Neural Information Processing SystemsAug-18-2025, 18:57:28 GMT

To this end, we first perform template feature embedding to embed the template's feature into

artificial intelligence, machine learning, point cloud, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

BEV-DWPVO: BEV-based Differentiable Weighted Procrustes for Low Scale-drift Monocular Visual Odometry on Ground

Wei, Yufei, Lu, Sha, Lu, Wangtao, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceFeb-27-2025

Monocular Visual Odometry (MVO) provides a cost-effective, real-time positioning solution for autonomous vehicles. However, MVO systems face the common issue of lacking inherent scale information from monocular cameras. Traditional methods have good interpretability but can only obtain relative scale and suffer from severe scale drift in long-distance tasks. Learning-based methods under perspective view leverage large amounts of training data to acquire prior knowledge and estimate absolute scale by predicting depth values. However, their generalization ability is limited due to the need to accurately estimate the depth of each point. In contrast, we propose a novel MVO system called BEV-DWPVO. Our approach leverages the common assumption of a ground plane, using Bird's-Eye View (BEV) feature maps to represent the environment in a grid-based structure with a unified scale. This enables us to reduce the complexity of pose estimation from 6 Degrees of Freedom (DoF) to 3-DoF. Keypoints are extracted and matched within the BEV space, followed by pose estimation through a differentiable weighted Procrustes solver. The entire system is fully differentiable, supporting end-to-end training with only pose supervision and no auxiliary tasks. We validate BEV-DWPVO on the challenging long-sequence datasets NCLT, Oxford, and KITTI, achieving superior results over existing MVO methods on most evaluation metrics.

estimation, keypoint, pose estimation, (14 more...)

arXiv.org Artificial Intelligence

2502.20078

Country:

North America > United States > Michigan (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

Ye, Xin, Yaman, Burhaneddin, Cheng, Sheng, Tao, Feng, Mallik, Abhirup, Ren, Liu

arXiv.org Artificial IntelligenceFeb-26-2025

Bird's-eye-view (BEV) representations play a crucial role in autonomous driving tasks. Despite recent advancements in BEV generation, inherent noise, stemming from sensor limitations and the learning process, remains largely unaddressed, resulting in suboptimal BEV representations that adversely impact the performance of downstream tasks. To address this, we propose BEVDiffuser, a novel diffusion model that effectively denoises BEV feature maps using the ground-truth object layout as guidance. BEVDiffuser can be operated in a plug-and-play manner during training time to enhance existing BEV models without requiring any architectural modifications. Extensive experiments on the challenging nuScenes dataset demonstrate BEVDiffuser's exceptional denoising and generation capabilities, which enable significant enhancement to existing BEV models, as evidenced by notable improvements of 12.3\% in mAP and 10.1\% in NDS achieved for 3D object detection without introducing additional computational complexity. Moreover, substantial improvements in long-tail object detection and under challenging weather and lighting conditions further validate BEVDiffuser's effectiveness in denoising and enhancing BEV representations.

bev feature map, bev model, bevdiffuser, (13 more...)

arXiv.org Artificial Intelligence

2502.19694

Country: North America (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.36)

Add feedback

RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

Li, Lantao, Yang, Kang, Zhang, Wenqi, Wang, Xiaoxue, Sun, Chen

arXiv.org Artificial IntelligenceJan-28-2025

Cooperative perception offers an optimal solution to overcome the perception limitations of single-agent systems by leveraging Vehicle-to-Everything (V2X) communication for data sharing and fusion across multiple agents. However, most existing approaches focus on single-modality data exchange, limiting the potential of both homogeneous and heterogeneous fusion across agents. This overlooks the opportunity to utilize multi-modality data per agent, restricting the system's performance. In the automotive industry, manufacturers adopt diverse sensor configurations, resulting in heterogeneous combinations of sensor modalities across agents. To harness the potential of every possible data source for optimal performance, we design a robust LiDAR and camera cross-modality fusion module, Radian-Glue-Attention (RG-Attn), applicable to both intra-agent cross-modality fusion and inter-agent cross-modality fusion scenarios, owing to the convenient coordinate conversion by transformation matrix and the unified sampling/inversion mechanism. We also propose two different architectures, named Paint-To-Puzzle (PTP) and Co-Sketching-Co-Coloring (CoS-CoCo), for conducting cooperative perception. PTP aims for maximum precision performance and achieves smaller data packet size by limiting cross-agent fusion to a single instance, but requiring all participants to be equipped with LiDAR. In contrast, CoS-CoCo supports agents with any configuration-LiDAR-only, camera-only, or LiDAR-camera-both, presenting more generalization ability. Our approach achieves state-of-the-art (SOTA) performance on both real and simulated cooperative perception datasets. The code will be released at GitHub in early 2025.

agent, artificial intelligence, fusion, (13 more...)

arXiv.org Artificial Intelligence

2501.16803

Country:

Asia > South Korea (0.14)
Europe > France > Île-de-France > Paris > Paris (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre: Research Report (0.64)

Industry:

Information Technology (0.93)
Automobiles & Trucks (0.87)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

BEV-ODOM: Reducing Scale Drift in Monocular Visual Odometry with BEV Representation

Wei, Yufei, Lu, Sha, Han, Fuzhang, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceNov-15-2024

Abstract-- Monocular visual odometry (MVO) is vital in autonomous navigation and robotics, providing a cost-effective and flexible motion tracking solution, but the inherent scale ambiguity in monocular setups often leads to cumulative errors over time. In this paper, we present BEV-ODOM, a novel MVO framework leveraging the Bird's Eye View (BEV) Representation to address scale drift. Unlike existing approaches, BEV-ODOM integrates a depth-based perspective-view (PV) to BEV encoder, a correlation feature extraction neck, and a CNN-MLP-based decoder, enabling it to estimate motion across three degrees of freedom without the need for depth supervision or complex optimization techniques. Our framework reduces scale drift in long-term sequences and achieves accurate motion estimation across various datasets, including NCLT, Oxford, and KITTI. In contrast, our method achieves low scale Monocular visual odometry (MVO) has been of interest drift using only pose supervision with BEV representation.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.10195

Country:

North America > United States > Michigan (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving

Lai, Zhihao, Liu, Chuanhao, Sheng, Shihui, Zhang, Zhiqiang

arXiv.org Artificial IntelligenceAug-27-2024

Abstract-- Accurate 3D object detection in autonomous driving is critical yet challenging due to occlusions, varying object sizes, and complex urban environments. This paper introduces the KAN-RCBEVDepth method, an innovative approach aimed at enhancing 3D object detection by fusing multimodal sensor data from cameras, LiDAR, and millimeter-wave radar. Our unique Bird's Eye View-based approach significantly improves detection accuracy and efficiency by seamlessly integrating diverse sensor inputs, refining spatial relationship understanding, and optimizing computational procedures. Experimental results show that the proposed method outperforms existing techniques across multiple detection metrics, achieving a higher Mean Distance AP (0.389, 23% improvement), a better ND Score (0.485, 17.1% improvement), and a faster Evaluation As illustrated in Figure 1, these sensors' complementary LiDAR delivers high-precision 3D point cloud data crucial Accurate 3D object detection is a critical component of for accurate depth perception. By leveraging the strengths of autonomous driving systems, enabling vehicles to perceive each sensor type, sensor fusion mitigates their weaknesses, their environment in three dimensions and precisely identify thereby enhancing the overall performance of 3D object and localize surrounding objects such as vehicles, including detection systems.

detection, kan-rcbevdepth 0, voxel, (14 more...)

arXiv.org Artificial Intelligence

2408.02088

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Texas (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (0.91)
Information Technology > Robotics & Automation (0.81)
Automobiles & Trucks (0.81)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Add feedback

Filters

Collaborating Authors

bev feature map

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

c49a28241640407b23bba8f2495f4bc9-Paper-Conference.pdf

f0fcf351df4eb6786e9bb6fc4e2dee02-Paper.pdf

c49a28241640407b23bba8f2495f4bc9-Paper-Conference.pdf

BEV-VLM: Trajectory Planning via Unified BEV Abstraction

3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds Le Hui

BEV-DWPVO: BEV-based Differentiable Weighted Procrustes for Low Scale-drift Monocular Visual Odometry on Ground

BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

BEV-ODOM: Reducing Scale Drift in Monocular Visual Odometry with BEV Representation

KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving