AITopics | Marlet, Renaud

Collaborating Authors

Marlet, Renaud

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Bartoccioni, Florent, Ramzi, Elias, Besnier, Victor, Venkataramanan, Shashanka, Vu, Tuan-Hung, Xu, Yihong, Chambon, Loick, Gidaris, Spyros, Odabas, Serkan, Hurych, David, Marlet, Renaud, Boulch, Alexandre, Chen, Mickael, Zablocki, Éloi, Bursuc, Andrei, Valle, Eduardo, Cord, Matthieu

arXiv.org Artificial IntelligenceFeb-21-2025

We explore the potential of large-scale generative video models for autonomous driving, introducing an open-source auto-regressive video model (VaViM) and its companion video-action model (VaVAM) to investigate how video pre-training transfers to real-world driving. VaViM is a simple auto-regressive video model that predicts frames using spatio-temporal token sequences. We show that it captures the semantics and dynamics of driving scenes. VaVAM, the video-action model, leverages the learned representations of VaViM to generate driving trajectories through imitation learning. Together, the models form a complete perception-to-action pipeline. We evaluate our models in open- and closed-loop driving scenarios, revealing that video-based pre-training holds promise for autonomous driving. Key insights include the semantic richness of the learned representations, the benefits of scaling for video synthesis, and the complex relationship between model size, data, and safety metrics in closed-loop evaluations. We release code and model weights at https://github.com/valeoai/VideoActionModel

large language model, machine learning, trajectory, (20 more...)

arXiv.org Artificial Intelligence

2502.15672

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.91)
Information Technology > Robotics & Automation (0.82)
Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

Sirko-Galouchenko, Sophia, Boulch, Alexandre, Gidaris, Spyros, Bursuc, Andrei, Vobecky, Antonin, Pérez, Patrick, Marlet, Renaud

arXiv.org Artificial IntelligenceJun-12-2024

We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach. Repository: https://github.com/valeoai/Occfeat

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.14027

Country: Europe > Czechia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Valeo4Cast: A Modular Approach to End-to-End Forecasting

Xu, Yihong, Zablocki, Éloi, Boulch, Alexandre, Puy, Gilles, Chen, Mickael, Bartoccioni, Florent, Samet, Nermin, Siméoni, Oriane, Gidaris, Spyros, Vu, Tuan-Hung, Bursuc, Andrei, Valle, Eduardo, Marlet, Renaud, Cord, Matthieu

arXiv.org Artificial IntelligenceJun-12-2024

Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting and we use a modular approach instead. Following a recent study, we individually build and train detection, tracking, and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. Our study reveals that this simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 end-to-end Forecasting Challenge held at CVPR 2024 Workshop on Autonomous Driving (WAD), with 63.82 mAPf. We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts.

artificial intelligence, forecasting, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.08113

Country: Europe > France (0.15)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback

ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation

Rommel, Cédric, Letzelter, Victor, Samet, Nermin, Marlet, Renaud, Cord, Matthieu, Pérez, Patrick, Valle, Eduardo

arXiv.org Artificial IntelligenceDec-11-2023

Monocular 3D human pose estimation (3D-HPE) is an inherently ambiguous task, as a 2D pose in an image might originate from different possible 3D poses. Yet, most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs. In this work, we provide theoretical and empirical evidence that, because of this ambiguity, common regression models are bound to predict topologically inconsistent poses, and that traditional evaluation metrics, such as the MPJPE, P-MPJPE and PCK, are insufficient to assess this aspect. As a solution, we propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input, together with their corresponding plausibility. Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models, thus greatly facilitating its training and usage. Furthermore, by constraining our model to lie within the human pose manifold, we can guarantee the consistency of all hypothetical poses predicted with our approach, which was not possible in previous works. We illustrate the usefulness of ManiPose in a synthetic 1D-to-2D lifting setting and demonstrate on real-world datasets that it outperforms state-of-the-art models in pose consistency by a large margin, while still reaching competitive MPJPE performance.

artificial intelligence, machine learning, manipose, (14 more...)

arXiv.org Artificial Intelligence

2312.06386

Country:

Europe > France (0.14)
South America > Brazil (0.14)
North America > Canada (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds

Sautier, Corentin, Puy, Gilles, Boulch, Alexandre, Marlet, Renaud, Lepetit, Vincent

arXiv.org Artificial IntelligenceOct-26-2023

We present a surprisingly simple and efficient method for self-supervision of 3D backbone on automotive Lidar point clouds. We design a contrastive loss between features of Lidar scans captured in the same scene. Several such approaches have been proposed in the literature from PointConstrast, which uses a contrast at the level of points, to the state-of-the-art TARL, which uses a contrast at the level of segments, roughly corresponding to objects. While the former enjoys a great simplicity of implementation, it is surpassed by the latter, which however requires a costly pre-processing. In BEVContrast, we define our contrast at the level of 2D cells in the Bird's Eye View plane. Resulting cell-level representations offer a good trade-off between the point-level representations exploited in PointContrast and segment-level representations exploited in TARL: we retain the simplicity of PointContrast (cell representations are cheap to compute) while surpassing the performance of TARL in downstream semantic segmentation.

automotive lidar point cloud, bevcontrast, self-supervision, (1 more...)

arXiv.org Artificial Intelligence

2310.17281

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (0.60)

Add feedback

DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion

Rommel, Cédric, Valle, Eduardo, Chen, Mickaël, Khalfaoui, Souhaiel, Marlet, Renaud, Cord, Matthieu, Pérez, Patrick

arXiv.org Artificial IntelligenceSep-4-2023

We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by integrating cutting-edge diffusion models, which have revolutionized diverse fields, but are relatively unexplored in 3D-HPE. We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE, and demonstrate its ability to refine standard supervised 3D-HPE. We also show how diffusion models lead to more robust estimations in the face of occlusions, and improve the time-coherence and the sagittal symmetry of predictions. Using the Human\,3.6M dataset, we illustrate the effectiveness of our approach and its superiority over existing models, even under adverse situations where the occlusion patterns in training do not match those in inference. Our findings indicate that while standalone diffusion models provide commendable performance, their accuracy is even better in combination with supervised models, opening exciting new avenues for 3D-HPE research.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2309.01575

Country:

Europe > France (0.14)
South America > Brazil (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.64)

Add feedback

RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving

Ando, Angelika, Gidaris, Spyros, Bursuc, Andrei, Puy, Gilles, Boulch, Alexandre, Marlet, Renaud

arXiv.org Artificial IntelligenceApr-25-2023

Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, e.g., via range projection, is an effective and popular approach. These projection-based methods usually benefit from fast computations and, when combined with techniques which use other point cloud representations, achieve state-of-the-art results. Today, projection-based methods leverage 2D CNNs but recent advances in computer vision show that vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks. In this work, we question if projection-based methods for 3D semantic segmentation can benefit from these latest improvements on ViTs. We answer positively but only after combining them with three key ingredients: (a) ViTs are notoriously hard to train and require a lot of training data to learn powerful representations. By preserving the same backbone architecture as for RGB images, we can exploit the knowledge from long training on large image collections that are much cheaper to acquire and annotate than point clouds. We reach our best results with pre-trained ViTs on large image datasets. (b) We compensate ViTs' lack of inductive bias by substituting a tailored convolutional stem for the classical linear embedding layer. (c) We refine pixel-wise predictions with a convolutional decoder and a skip connection from the convolutional stem to combine low-level but fine-grained features of the the convolutional stem with the high-level but coarse predictions of the ViT encoder. With these ingredients, we show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and SemanticKITTI. The code is available at https://github.com/valeoai/rangevit.

artificial intelligence, machine learning, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2301.10222

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.65)
Automobiles & Trucks (0.50)
Information Technology > Robotics & Automation (0.41)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ALSO: Automotive Lidar Self-supervision by Occupancy estimation

Boulch, Alexandre, Sautier, Corentin, Michele, Björn, Puy, Gilles, Marlet, Renaud

arXiv.org Artificial IntelligenceApr-4-2023

We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled, and to use the underlying latent vectors as input to the perception head. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information, that can be used to boost an actual perception task. This principle has a very simple formulation, which makes it both easy to implement and widely applicable to a large range of 3D sensors and deep networks performing semantic segmentation or object detection. In fact, it supports a single-stream pipeline, as opposed to most contrastive learning approaches, allowing training on limited resources. We conducted extensive experiments on various autonomous driving datasets, involving very different kinds of lidars, for both semantic segmentation and object detection. The results show the effectiveness of our method to learn useful representations without any annotation, compared to existing approaches. Code is available at https://github.com/valeoai/ALSO

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2212.05867

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (0.34)
Transportation > Ground (0.34)
Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Take One Gram of Neural Features, Get Enhanced Group Robustness

Roburin, Simon, Corbière, Charles, Puy, Gilles, Thome, Nicolas, Aubry, Matthieu, Marlet, Renaud, Pérez, Patrick

arXiv.org Artificial IntelligenceFeb-7-2023

Predictive performance of machine learning models trained with empirical risk minimization (ERM) can degrade considerably under distribution shifts. The presence of spurious correlations in training datasets leads ERM-trained models to display high loss when evaluated on minority groups not presenting such correlations. Extensive attempts have been made to develop methods improving worst-group robustness. However, they require group information for each training input or at least, a validation set with group labels to tune their hyperparameters, which may be expensive to get or unknown a priori. In this paper, we address the challenge of improving group robustness without group annotation during training or validation. To this end, we propose to partition the training dataset into groups based on Gram matrices of features extracted by an ``identification'' model and to apply robust optimization based on these pseudo-groups. In the realistic context where no group labels are available, our experiments show that our approach not only improves group robustness over ERM but also outperforms all recent baselines

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2208.12625

Country: Europe (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

A spherical analysis of Adam with Batch Normalization

Roburin, Simon, de Mont-Marin, Yann, Bursuc, Andrei, Marlet, Renaud, Pérez, Patrick, Aubry, Mathieu

arXiv.org Machine LearningOct-21-2020

Batch Normalization (BN) is a prominent deep learning technique. In spite of its apparent simplicity, its implications over optimization are yet to be fully understood. While previous studies mostly focus on the interaction between BN and stochastic gradient descent (SGD), we develop a geometric perspective which allows us to precisely characterize the relation between BN and Adam. This formulation and the associated geometric interpretation shed new light on the training dynamics. Firstly, we use it to derive the first effective learning rate expression of Adam. Then we show that, in the presence of BN layers, performing SGD alone is actually equivalent to a variant of Adam constrained to the unit hypersphere. Finally, our analysis outlines phenomena that previous variants of Adam act on and we experimentally validate their importance in the optimization process.

deep learning, learning rate, neural network, (18 more...)

arXiv.org Machine Learning

2006.13382

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback