AITopics

2503.09878

Country:

Europe > Sweden (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

arXiv.org Artificial IntelligenceNov-4-2024

S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving

Wozniak, Maciej K., Govindarajan, Hariprasath, Klingner, Marvin, Maurice, Camille, Kiran, B Ravi, Yogamani, Senthil

Recent self-supervised clustering-based pre-training techniques like DINO and Cribo have shown impressive results for downstream detection and segmentation tasks. However, real-world applications such as autonomous driving face challenges with imbalanced object class and size distributions and complex scene geometries. In this paper, we propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Specifically, our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs. Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level. Our learned representations significantly improve performance in downstream semantic segmentation and 3D object detection tasks on the nuScenes, nuImages, and Cityscapes datasets and show promising domain translation properties.

artificial intelligence, machine learning, proceedings, (15 more...)

2410.23085

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceMay-29-2024

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Gosala, Nikhil, Petek, Kürsat, Kiran, B Ravi, Yogamani, Senthil, Drews-Jr, Paulo, Burgard, Wolfram, Valada, Abhinav

Semantic Bird's Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation. Extensive evaluations on the KITTI-360 and nuScenes datasets demonstrate that our approach performs on par with the existing state-of-the-art approaches while using only 1% of BEV labels and no additional labeled data.

artificial intelligence, machine learning, representation, (14 more...)

2405.18852

Country: Europe > Germany (0.28)

Genre: Research Report (0.84)

Industry:

Automobiles & Trucks (0.66)
Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

arXiv.org Artificial IntelligenceMar-18-2024

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Schramm, Jonas, Vödisch, Niclas, Petek, Kürsat, Kiran, B Ravi, Yogamani, Senthil, Burgard, Wolfram, Valada, Abhinav

Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at http://bevcar.cs.uni-freiburg.de.

artificial intelligence, machine learning, segmentation, (17 more...)

2403.11761

Country: Europe > Germany > Baden-Württemberg > Freiburg (0.24)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Industry: Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJul-20-2023

Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles

Almin, Alexandre, Lemarié, Léo, Duong, Anh, Kiran, B Ravi

Autonomous driving (AD) perception today relies heavily on deep learning based architectures requiring large scale annotated datasets with their associated costs for curation and annotation. The 3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization. We propose a new dataset, Navya 3D Segmentation (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain, including rural, urban, industrial sites and universities from 13 countries. It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds. We also propose a novel method for sequential dataset split generation based on iterative multi-label stratification, and demonstrated to achieve a +1.2% mIoU improvement over the original split proposed by SemanticKITTI dataset. A complete benchmark for semantic segmentation task was performed, with state of the art methods. Finally, we demonstrate an Active Learning (AL) based dataset distillation framework. We introduce a novel heuristic-free sampling method called ego-pose distance based sampling in the context of AL. A detailed presentation on the dataset is available here https://www.youtube.com/watch?v=5m6ALIs-s20.

artificial intelligence, deep learning, machine learning, (16 more...)

2302.08292

Country:

Europe (0.93)
Asia (0.68)

Genre: Research Report > Promising Solution (0.68)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceFeb-21-2023

Evaluating the effect of data augmentation and BALD heuristics on distillation of Semantic-KITTI dataset

Duong, Anh, Almin, Alexandre, Lemarié, Léo, Kiran, B Ravi

Active Learning (AL) has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets. In this study we evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection (subset with near equivalent performance as full dataset). We also study the effect of application of data augmentation (DA) within Bayesian AL based dataset distillation. We perform these experiments on the full Semantic-KITTI dataset. We extend our study over our existing work only on 1/4th of the same dataset. Addition of DA and BALD have a negative impact over the labeling efficiency and thus the capacity to distill datasets. We demonstrate key issues in designing a functional AL framework and finally conclude with a review of challenges in real world active learning.

artificial intelligence, deep learning, machine learning, (14 more...)

2302.10679

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningMay-26-2020

Road Segmentation on low resolution Lidar point clouds for autonomous vehicles

Gigli, Leonardo, Kiran, B Ravi, Paul, Thomas, Serna, Andres, Vemuri, Nagarjuna, Marcotegui, Beatriz, Velasco-Forero, Santiago

Point cloud datasets for perception tasks in the context of autonomous driving often rely on high resolution 64-layer Light Detection and Ranging (LIDAR) scanners. They are expensive to deploy on real-world autonomous driving sensor architectures which usually employ 16/32 layer LIDARs. We evaluate the effect of subsampling image based representations of dense point clouds on the accuracy of the road segmentation task. In our experiments the low resolution 16/32 layer LIDAR point clouds are simulated by subsampling the original 64 layer data, for subsequent transformation in to a feature map in the Bird-Eye-View (BEV) and SphericalView (SV) representations of the point cloud. We introduce the usage of the local normal vector with the LIDAR's spherical coordinates as an input channel to existing LoDNN architectures. We demonstrate that this local normal feature in conjunction with classical features not only improves performance for binary road segmentation on full resolution point clouds, but it also reduces the negative impact on the accuracy when subsampling dense point clouds as compared to the usage of classical features alone. We assess our method with several experiments on two datasets: KITTI Road-segmentation benchmark and the recently released Semantic KITTI dataset.

deep learning, neural network, segmentation, (21 more...)

2005.13102

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (0.68)
Automobiles & Trucks (0.68)
Information Technology > Robotics & Automation (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningJan-16-2019

Exploring applications of deep reinforcement learning for real-world autonomous driving systems

Talpaert, Victor, Sobh, Ibrahim, Kiran, B Ravi, Mannion, Patrick, Yogamani, Senthil, El-Sallab, Ahmad, Perez, Patrick

Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still at its infancy in terms of usability in real-world applications. Our goal in this paper is to encourage real-world deployment of DRL in various autonomous driving (AD) applications. We first provide an overview of the tasks in autonomous driving systems, reinforcement learning algorithms and applications of DRL to AD systems. We then discuss the challenges which must be addressed to enable further progress towards real-world deployment.

deep learning, ground transportation, neural network, (20 more...)

1901.01536

Country:

Europe (0.93)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Overview (0.86)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-11-2018

Regression and Classification by Zonal Kriging

Serra, Jean, Angulo, Jesus, Kiran, B Ravi

Consider a family $Z=\{\boldsymbol{x_{i}},y_{i}$,$1\leq i\leq N\}$ of $N$ pairs of vectors $\boldsymbol{x_{i}} \in \mathbb{R}^d$ and scalars $y_{i}$ that we aim to predict for a new sample vector $\mathbf{x}_0$. Kriging models $y$ as a sum of a deterministic function $m$, a drift which depends on the point $\boldsymbol{x}$, and a random function $z$ with zero mean. The zonality hypothesis interprets $y$ as a weighted sum of $d$ random functions of a single independent variables, each of which is a kriging, with a quadratic form for the variograms drift. We can therefore construct an unbiased estimator $y^{*}(\boldsymbol{x_{0}})=\sum_{i}\lambda^{i}z(\boldsymbol{x_{i}})$ de $y(\boldsymbol{x_{0}})$ with minimal variance $E[y^{*}(\boldsymbol{x_{0}})-y(\boldsymbol{x_{0}})]^{2}$, with the help of the known training set points. We give the explicitly closed form for $\lambda^{i}$ without having calculated the inverse of the matrices.

artificial intelligence, upstream oil & gas, variance, (20 more...)

1811.12507

Genre: Research Report (0.84)

Industry: Energy > Oil & Gas > Upstream (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningMay-25-2017

Real-Time Background Subtraction Using Adaptive Sampling and Cascade of Gaussians

Kiran, B Ravi, Yogamani, Senthil

Background-Foreground classification is a fundamental well-studied problem in computer vision. Due to the pixel-wise nature of modeling and processing in the algorithm, it is usually difficult to satisfy real-time constraints. There is a trade-off between the speed (because of model complexity) and accuracy. Inspired by the rejection cascade of Viola-Jones classifier, we decompose the Gaussian Mixture Model (GMM) into an adaptive cascade of classifiers. This way we achieve a good improvement in speed without compensating for accuracy. In the training phase, we learn multiple KDEs for different durations to be used as strong prior distribution and detect probable oscillating pixels which usually results in misclassifications. We propose a confidence measure for the classifier based on temporal consistency and the prior distribution. The confidence measure thus derived is used to adapt the learning rate and the thresholds of the model, to improve accuracy. The confidence measure is also employed to perform temporal and spatial sampling in a principled way. We demonstrate a speed-up factor of 5x to 10x and 17 percent average improvement in accuracy over several standard videos.

artificial intelligence, machine learning, pixel, (16 more...)

1705.09339

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)