AITopics | Bursuc, Andrei

Collaborating Authors

Bursuc, Andrei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Bartoccioni, Florent, Ramzi, Elias, Besnier, Victor, Venkataramanan, Shashanka, Vu, Tuan-Hung, Xu, Yihong, Chambon, Loick, Gidaris, Spyros, Odabas, Serkan, Hurych, David, Marlet, Renaud, Boulch, Alexandre, Chen, Mickael, Zablocki, Éloi, Bursuc, Andrei, Valle, Eduardo, Cord, Matthieu

arXiv.org Artificial IntelligenceFeb-21-2025

We explore the potential of large-scale generative video models for autonomous driving, introducing an open-source auto-regressive video model (VaViM) and its companion video-action model (VaVAM) to investigate how video pre-training transfers to real-world driving. VaViM is a simple auto-regressive video model that predicts frames using spatio-temporal token sequences. We show that it captures the semantics and dynamics of driving scenes. VaVAM, the video-action model, leverages the learned representations of VaViM to generate driving trajectories through imitation learning. Together, the models form a complete perception-to-action pipeline. We evaluate our models in open- and closed-loop driving scenarios, revealing that video-based pre-training holds promise for autonomous driving. Key insights include the semantic richness of the learned representations, the benefits of scaling for video synthesis, and the complex relationship between model size, data, and safety metrics in closed-loop evaluations. We release code and model weights at https://github.com/valeoai/VideoActionModel

large language model, machine learning, trajectory, (20 more...)

arXiv.org Artificial Intelligence

2502.15672

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.91)
Information Technology > Robotics & Automation (0.82)
Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DOC-Depth: A novel approach for dense depth ground truth generation

de Moreau, Simon, Corsia, Mathias, Bouchiba, Hassan, Almehio, Yasser, Bursuc, Andrei, El-Idrissi, Hafid, Moutarde, Fabien

arXiv.org Artificial IntelligenceFeb-4-2025

Accurate depth information is essential for many computer vision applications. Yet, no available dataset recording method allows for fully dense accurate depth estimation in a large scale dynamic environment. In this paper, we introduce DOC-Depth, a novel, efficient and easy-to-deploy approach for dense depth generation from any LiDAR sensor. After reconstructing consistent dense 3D environment using LiDAR odometry, we address dynamic objects occlusions automatically thanks to DOC, our state-of-the art dynamic object classification method. Additionally, DOC-Depth is fast and scalable, allowing for the creation of unbounded datasets in terms of size and time. We demonstrate the effectiveness of our approach on the KITTI dataset, improving its density from 16.1% to 71.2% and release this new fully dense depth annotation, to facilitate future research in the domain. We also showcase results using various LiDAR sensors and in multiple environments. All software components are publicly available for the research community.

artificial intelligence, machine learning, survey article, (18 more...)

arXiv.org Artificial Intelligence

2502.02144

Genre:

Research Report > Promising Solution (0.50)
Overview > Innovation (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback

Domain Adaptation with a Single Vision-Language Embedding

Fahes, Mohammad, Vu, Tuan-Hung, Bursuc, Andrei, Pérez, Patrick, de Charette, Raoul

arXiv.org Artificial IntelligenceOct-28-2024

Domain adaptation has been extensively investigated in computer vision but still requires access to target data at the training time, which might be difficult to obtain in some uncommon conditions. In this paper, we present a new framework for domain adaptation relying on a single Vision-Language (VL) latent embedding instead of full target data. First, leveraging a contrastive language-image pre-training model (CLIP), we propose prompt/photo-driven instance normalization (PIN). PIN is a feature augmentation method that mines multiple visual styles using a single target VL latent embedding, by optimizing affine transformations of low-level source features. The VL embedding can come from a language prompt describing the target domain, a partially optimized language prompt, or a single unlabeled target image. Second, we show that these mined styles (i.e., augmentations) can be used for zero-shot (i.e., target-free) and one-shot unsupervised domain adaptation. Experiments on semantic segmentation demonstrate the effectiveness of the proposed method, which outperforms relevant baselines in the zero-shot and one-shot settings.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.21361

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

The BRAVO Semantic Segmentation Challenge Results in UNCV2024

Vu, Tuan-Hung, Valle, Eduardo, Bursuc, Andrei, Kerssies, Tommie, de Geus, Daan, Dubbelman, Gijs, Qian, Long, Zhu, Bingke, Chen, Yingying, Tang, Ming, Wang, Jinqiao, Vojíř, Tomáš, Šochman, Jan, Matas, Jiří, Smith, Michael, Ferrie, Frank, Basu, Shamik, Sakaridis, Christos, Van Gool, Luc

arXiv.org Artificial IntelligenceOct-9-2024

We propose the unified BRAVO challenge to benchmark the reliability of semantic segmentation models under realistic perturbations and unknown out-of-distribution (OOD) scenarios. We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to detect object classes that are unknown during training. The challenge attracted nearly 100 submissions from international teams representing notable research institutions. The results reveal interesting insights into the importance of large-scale pre-training and minimal architectural design in developing robust and reliable semantic segmentation models.

artificial intelligence, bravo semantic segmentation challenge result, uncv2024

arXiv.org Artificial Intelligence

2409.15107

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

Sirko-Galouchenko, Sophia, Boulch, Alexandre, Gidaris, Spyros, Bursuc, Andrei, Vobecky, Antonin, Pérez, Patrick, Marlet, Renaud

arXiv.org Artificial IntelligenceJun-12-2024

We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach. Repository: https://github.com/valeoai/Occfeat

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.14027

Country: Europe > Czechia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Valeo4Cast: A Modular Approach to End-to-End Forecasting

Xu, Yihong, Zablocki, Éloi, Boulch, Alexandre, Puy, Gilles, Chen, Mickael, Bartoccioni, Florent, Samet, Nermin, Siméoni, Oriane, Gidaris, Spyros, Vu, Tuan-Hung, Bursuc, Andrei, Valle, Eduardo, Marlet, Renaud, Cord, Matthieu

arXiv.org Artificial IntelligenceJun-12-2024

Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting and we use a modular approach instead. Following a recent study, we individually build and train detection, tracking, and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. Our study reveals that this simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 end-to-end Forecasting Challenge held at CVPR 2024 Workshop on Autonomous Driving (WAD), with 63.82 mAPf. We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts.

artificial intelligence, forecasting, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.08113

Country: Europe > France (0.15)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback

Learning to Generate Training Datasets for Robust Semantic Segmentation

Hariat, Marwane, Laurent, Olivier, Kazmierczak, Rémi, Zhang, Shihao, Bursuc, Andrei, Yao, Angela, Franchi, Gianni

arXiv.org Artificial IntelligenceJan-4-2024

Semantic segmentation methods have advanced significantly. Still, their robustness to real-world perturbations and object types not seen during training remains a challenge, particularly in safety-critical applications. We propose a novel approach to improve the robustness of semantic segmentation techniques by leveraging the synergy between label-to-image generators and image-to-label segmentation models. Specifically, we design Robusta, a novel robust conditional generative adversarial network to generate realistic and plausible perturbed images that can be used to train reliable segmentation models. We conduct in-depth studies of the proposed generative model, assess the performance and robustness of the downstream segmentation network, and demonstrate that our approach can significantly enhance the robustness in the face of real-world perturbations, distribution shifts, and out-of-distribution samples. Our results suggest that this approach could be valuable in safety-critical applications, where the reliability of perception modules such as semantic segmentation is of utmost importance and comes with a limited computational budget in inference. We release our code at https://github.com/ENSTA-U2IS/robusta.

machine learning, natural language, robusta, (19 more...)

arXiv.org Artificial Intelligence

2308.02535

Genre: Research Report > New Finding (1.00)

Industry: Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models

Franchi, Gianni, Laurent, Olivier, Leguéry, Maxence, Bursuc, Andrei, Pilzer, Andrea, Yao, Angela

arXiv.org Machine LearningDec-23-2023

Deep Neural Networks (DNNs) are powerful tools for various computer vision tasks, yet they often struggle with reliable uncertainty quantification - a critical requirement for real-world applications. Bayesian Neural Networks (BNN) are equipped for uncertainty estimation but cannot scale to large DNNs that are highly unstable to train. To address this challenge, we introduce the Adaptable Bayesian Neural Network (ABNN), a simple and scalable strategy to seamlessly transform DNNs into BNNs in a post-hoc manner with minimal computational and training overheads. ABNN preserves the main predictive properties of DNNs while enhancing their uncertainty quantification abilities through simple BNN adaptation layers (attached to normalization layers) and a few fine-tuning steps on pre-trained models. We conduct extensive experiments across multiple datasets for image classification and semantic segmentation tasks, and our results demonstrate that ABNN achieves state-of-the-art performance without the computational budget typically associated with ensemble methods.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

2312.15297

Country:

North America > United States > California (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

The Robust Semantic Segmentation UNCV2023 Challenge Results

Yu, Xuanlong, Zuo, Yi, Wang, Zitao, Zhang, Xiaowen, Zhao, Jiaxuan, Yang, Yuting, Jiao, Licheng, Peng, Rui, Wang, Xinyi, Zhang, Junpei, Zhang, Kexin, Liu, Fang, Alcover-Couso, Roberto, SanMiguel, Juan C., Escudero-Viñolo, Marcos, Tian, Hanlin, Matsui, Kenta, Wang, Tianhao, Adan, Fahmy, Gao, Zhitong, He, Xuming, Bouniot, Quentin, Moghaddam, Hossein, Rai, Shyam Nandan, Cermelli, Fabio, Masone, Carlo, Pilzer, Andrea, Ricci, Elisa, Bursuc, Andrei, Solin, Arno, Trapp, Martin, Li, Rui, Yao, Angela, Chen, Wenlong, Simpson, Ivor, Campbell, Neill D. F., Franchi, Gianni

arXiv.org Artificial IntelligenceSep-27-2023

This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty quantification methodologies presented at prominent conferences in the fields of computer vision and machine learning and journals over the past few years. Within this document, the challenge is introduced, shedding light on its purpose and objectives, which primarily revolved around enhancing the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report then delves into the top-performing solutions. Moreover, the document aims to provide a comprehensive overview of the diverse solutions deployed by all participants. By doing so, it seeks to offer readers a deeper insight into the array of strategies that can be leveraged to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, especially within urban environments.

artificial intelligence, semantic segmentation uncv2023 challenge result, survey article

arXiv.org Artificial Intelligence

2309.15478

Genre: Overview (0.73)

Technology: Information Technology > Artificial Intelligence (0.87)

Add feedback

Improving CLIP Robustness with Knowledge Distillation and Self-Training

Laroudie, Clement, Bursuc, Andrei, Ha, Mai Lan, Franchi, Gianni

arXiv.org Artificial IntelligenceSep-19-2023

This paper examines the robustness of a multi-modal computer vision model, CLIP (Contrastive Language-Image Pretraining), in the context of unsupervised learning. The main objective is twofold: first, to evaluate the robustness of CLIP, and second, to explore strategies for augmenting its robustness. To achieve this, we introduce a novel approach named LP-CLIP. This technique involves the distillation of CLIP features through the incorporation of a linear probing layer positioned atop its encoding structure. This newly added layer is trained utilizing pseudo-labels produced by CLIP, coupled with a self-training strategy. The LP-CLIP technique offers a promising approach to enhance the robustness of CLIP without the need for annotations. By leveraging a simple linear probing layer, we aim to improve the model's ability to withstand various uncertainties and challenges commonly encountered in real-world scenarios. Importantly, our approach does not rely on annotated data, which makes it particularly valuable in situations where labeled data might be scarce or costly to obtain. Our proposed approach increases the robustness of CLIP with SOTA results compared to supervised technique on various datasets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.10361

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Promising Solution (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback