AITopics | bev

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Neural Information Processing SystemsFeb-16-2026, 10:17:00 GMT

Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion

Radar is ubiquitous in autonomous driving systems due to its low cost and good adaptability to bad weather.

artificial intelligence, detection, machine learning, (16 more...)

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Automobiles & Trucks (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Neural Information Processing SystemsFeb-11-2026, 02:26:24 GMT

f2fc990265c712c49d51a18a32b39f0c-Paper.pdf

bev, detection, point cloud, (12 more...)

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada (0.04)

Genre: Research Report (0.46)

Industry:

Information Technology (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)

arXiv.org Artificial IntelligenceOct-24-2025

Dino-Diffusion Modular Designs Bridge the Cross-Domain Gap in Autonomous Parking

Wu, Zixuan, Zhang, Hengyuan, Chen, Ting-Hsuan, Guo, Yuliang, Paz, David, Huang, Xinyu, Ren, Liu

Parking is a critical pillar of driving safety. While recent end-to-end (E2E) approaches have achieved promising in-domain results, robustness under domain shifts (e.g., weather and lighting changes) remains a key challenge. Rather than relying on additional data, in this paper, we propose Dino-Diffusion Parking (DDP), a domain-agnostic autonomous parking pipeline that integrates visual foundation models with diffusion-based planning to enable generalized perception and robust motion planning under distribution shifts. We train our pipeline in CARLA at regular setting and transfer it to more adversarial settings in a zero-shot fashion. Our model consistently achieves a parking success rate above 90% across all tested out-of-distribution (OOD) scenarios, with ablation studies confirming that both the network architecture and algorithmic design significantly enhance cross-domain performance over existing baselines. Furthermore, testing in a 3D Gaussian splatting (3DGS) environment reconstructed from a real-world parking lot demonstrates promising sim-to-real transfer.

artificial intelligence, international conference, trajectory, (16 more...)

2510.20335

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Neural Information Processing SystemsOct-9-2025, 04:08:30 GMT

a8f7f12b29d9b8c227785f6b529f63b7-Paper-Conference.pdf

artificial intelligence, detection, machine learning, (16 more...)

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Automobiles & Trucks (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceSep-23-2025

RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

Zhao, Bin, Garg, Nakul

Millimeter-wave radar provides perception robust to fog, smoke, dust, and low light, making it attractive for size, weight, and power constrained robotic platforms. Current radar imaging methods, however, rely on synthetic aperture or multi-frame aggregation to improve resolution, which is impractical for small aerial, inspection, or wearable systems. We present RadarSFD, a conditional latent diffusion framework that reconstructs dense LiDAR-like point clouds from a single radar frame without motion or SAR. Our approach transfers geometric priors from a pretrained monocular depth estimator into the diffusion backbone, anchors them to radar inputs via channel-wise latent concatenation, and regularizes outputs with a dual-space objective combining latent and pixel-space losses. On the RadarHD benchmark, RadarSFD achieves 35 cm Chamfer Distance and 28 cm Modified Hausdorff Distance, improving over the single-frame RadarHD baseline (56 cm, 45 cm) and remaining competitive with multi-frame methods using 5-41 frames. Qualitative results show recovery of fine walls and narrow gaps, and experiments across new environments confirm strong generalization. Ablation studies highlight the importance of pretrained initialization, radar BEV conditioning, and the dual-space loss. Together, these results establish the first practical single-frame, no-SAR mmWave radar pipeline for dense point cloud perception in compact robotic systems.

artificial intelligence, diffusion model, machine learning, (18 more...)

2509.18068

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > Texas > Harris County > Houston (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.69)
Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.55)

Neural Information Processing SystemsAug-17-2025, 06:28:21 GMT

f2fc990265c712c49d51a18a32b39f0c-Paper.pdf

artificial intelligence, detection, machine learning, (14 more...)

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.46)

Industry:

Information Technology (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

arXiv.org Artificial IntelligenceMar-5-2025

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

Zhang, Arthur, Sikchi, Harshit, Zhang, Amy, Biswas, Joydeep

We address the long-horizon mapless navigation problem: enabling robots to traverse novel environments without relying on high-definition maps or precise waypoints that specify exactly where to navigate. Achieving this requires overcoming two major challenges -- learning robust, generalizable perceptual representations of the environment without pre-enumerating all possible navigation factors and forms of perceptual aliasing and utilizing these learned representations to plan human-aligned navigation paths. Existing solutions struggle to generalize due to their reliance on hand-curated object lists that overlook unforeseen factors, end-to-end learning of navigation features from scarce large-scale robot datasets, and handcrafted reward functions that scale poorly to diverse scenarios. To overcome these limitations, we propose CREStE, the first method that learns representations and rewards for addressing the full mapless navigation problem without relying on large-scale robot datasets or manually curated features. CREStE leverages visual foundation models trained on internet-scale data to learn continuous bird's-eye-view representations capturing elevation, semantics, and instance-level features. To utilize learned representations for planning, we propose a counterfactual-based loss and active learning procedure that focuses on the most salient perceptual cues by querying humans for counterfactual trajectory annotations in challenging scenes. We evaluate CREStE in kilometer-scale navigation tasks across six distinct urban environments. CREStE significantly outperforms all state-of-the-art approaches with 70% fewer human interventions per mission, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. For videos and additional materials, see https://amrl.cs.utexas.edu/creste .

demonstration, navigation, representation, (15 more...)

2503.03921

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Research Report > Promising Solution (0.48)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
(4 more...)

Zhu, Haoran, Dong, Zhenyuan, Topollai, Kristi, Choromanska, Anna

AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data

arXiv.org Artificial IntelligenceJan-8-2025

As opposed to human drivers, current autonomous driving systems still require vast amounts of labeled data to train. Recently, world models have been proposed to simultaneously enhance autonomous driving capabilities by improving the way these systems understand complex real-world environments and reduce their data demands via self-supervised pre-training. In this paper, we present AD-L-JEPA (aka Autonomous Driving with LiDAR data via a Joint Embedding Predictive Architecture), a novel self-supervised pre-training framework for autonomous driving with LiDAR data that, as opposed to existing methods, is neither generative nor contrastive. Our method learns spatial world models with a joint embedding predictive architecture. Instead of explicitly generating masked unknown regions, our self-supervised world models predict Bird's Eye View (BEV) embeddings to represent the diverse nature of autonomous driving scenes. Our approach furthermore eliminates the need to manually create positive and negative pairs, as is the case in contrastive learning. AD-L-JEPA leads to simpler implementation and enhanced learned representations. We qualitatively and quantitatively demonstrate high-quality of embeddings learned with AD-L-JEPA. We furthermore evaluate the accuracy and label efficiency of AD-L-JEPA on popular downstream tasks such as LiDAR 3D object detection and associated transfer learning. Our experimental evaluation demonstrates that AD-L-JEPA is a plausible approach for self-supervised pre-training in autonomous driving applications and is the best available approach outperforming SOTA, including most recently proposed Occupancy-MAE [1] and ALSO [2]. The source code of AD-L-JEPA is available at https://github.com/HaoranZhuExplorer/AD-L-JEPA-Release.

ad-l-jepa, artificial intelligence, machine learning, (17 more...)

2501.04969

Country: North America > United States > New York (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Antunes-García, Miguel, Bergasa, Luis M., Montiel-Marín, Santiago, Barea, Rafael, Sánchez-García, Fabio, Llamazares, Ángel

Fast and Efficient Transformer-based Method for Bird's Eye View Instance Prediction

arXiv.org Artificial IntelligenceNov-11-2024

Accurate object detection and prediction are critical to ensure the safety and efficiency of self-driving architectures. Predicting object trajectories and occupancy enables autonomous vehicles to anticipate movements and make decisions with future information, increasing their adaptability and reducing the risk of accidents. Current State-Of-The-Art (SOTA) approaches often isolate the detection, tracking, and prediction stages, which can lead to significant prediction errors due to accumulated inaccuracies between stages. Recent advances have improved the feature representation of multi-camera perception systems through Bird's-Eye View (BEV) transformations, boosting the development of end-to-end systems capable of predicting environmental elements directly from vehicle sensor data. These systems, however, often suffer from high processing times and number of parameters, creating challenges for real-world deployment. To address these issues, this paper introduces a novel BEV instance prediction architecture based on a simplified paradigm that relies only on instance segmentation and flow prediction. The proposed system prioritizes speed, aiming at reduced parameter counts and inference times compared to existing SOTA architectures, thanks to the incorporation of an efficient transformer-based architecture. Furthermore, the implementation of the proposed architecture is optimized for performance improvements in PyTorch version 2.1. Code and trained models are available at https://github.com/miguelag99/Efficient-Instance-Prediction

artificial intelligence, machine learning, natural language, (18 more...)

2411.06851

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)