AITopics | Vu, Tuan-Hung

Collaborating Authors

Vu, Tuan-Hung

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Bartoccioni, Florent, Ramzi, Elias, Besnier, Victor, Venkataramanan, Shashanka, Vu, Tuan-Hung, Xu, Yihong, Chambon, Loick, Gidaris, Spyros, Odabas, Serkan, Hurych, David, Marlet, Renaud, Boulch, Alexandre, Chen, Mickael, Zablocki, Éloi, Bursuc, Andrei, Valle, Eduardo, Cord, Matthieu

arXiv.org Artificial IntelligenceFeb-21-2025

We explore the potential of large-scale generative video models for autonomous driving, introducing an open-source auto-regressive video model (VaViM) and its companion video-action model (VaVAM) to investigate how video pre-training transfers to real-world driving. VaViM is a simple auto-regressive video model that predicts frames using spatio-temporal token sequences. We show that it captures the semantics and dynamics of driving scenes. VaVAM, the video-action model, leverages the learned representations of VaViM to generate driving trajectories through imitation learning. Together, the models form a complete perception-to-action pipeline. We evaluate our models in open- and closed-loop driving scenarios, revealing that video-based pre-training holds promise for autonomous driving. Key insights include the semantic richness of the learned representations, the benefits of scaling for video synthesis, and the complex relationship between model size, data, and safety metrics in closed-loop evaluations. We release code and model weights at https://github.com/valeoai/VideoActionModel

large language model, machine learning, trajectory, (20 more...)

arXiv.org Artificial Intelligence

2502.15672

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.91)
Information Technology > Robotics & Automation (0.82)
Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PPT: Pre-Training with Pseudo-Labeled Trajectories for Motion Forecasting

Xu, Yihong, Yin, Yuan, Vu, Tuan-Hung, Boulch, Alexandre, Zablocki, Éloi, Cord, Matthieu

arXiv.org Artificial IntelligenceDec-9-2024

Motion forecasting (MF) for autonomous driving aims at anticipating trajectories of surrounding agents in complex urban scenarios. In this work, we investigate a mixed strategy in MF training that first pre-train motion forecasters on pseudo-labeled data, then fine-tune them on annotated data. To obtain pseudo-labeled trajectories, we propose a simple pipeline that leverages off-the-shelf single-frame 3D object detectors and non-learning trackers. The whole pre-training strategy including pseudo-labeling is coined as PPT. Our extensive experiments demonstrate that: (1) combining PPT with supervised fine-tuning on annotated data achieves superior performance on diverse testbeds, especially under annotation-efficient regimes, (2) scaling up to multiple datasets improves the previous state-of-the-art and (3) PPT helps enhance cross-dataset generalization. Our findings showcase PPT as a promising pre-training solution for robust motion forecasting in diverse autonomous driving contexts.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2412.06491

Country: Europe (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Automobiles & Trucks (0.55)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)

Add feedback

Domain Adaptation with a Single Vision-Language Embedding

Fahes, Mohammad, Vu, Tuan-Hung, Bursuc, Andrei, Pérez, Patrick, de Charette, Raoul

arXiv.org Artificial IntelligenceOct-28-2024

Domain adaptation has been extensively investigated in computer vision but still requires access to target data at the training time, which might be difficult to obtain in some uncommon conditions. In this paper, we present a new framework for domain adaptation relying on a single Vision-Language (VL) latent embedding instead of full target data. First, leveraging a contrastive language-image pre-training model (CLIP), we propose prompt/photo-driven instance normalization (PIN). PIN is a feature augmentation method that mines multiple visual styles using a single target VL latent embedding, by optimizing affine transformations of low-level source features. The VL embedding can come from a language prompt describing the target domain, a partially optimized language prompt, or a single unlabeled target image. Second, we show that these mined styles (i.e., augmentations) can be used for zero-shot (i.e., target-free) and one-shot unsupervised domain adaptation. Experiments on semantic segmentation demonstrate the effectiveness of the proposed method, which outperforms relevant baselines in the zero-shot and one-shot settings.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.21361

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

The BRAVO Semantic Segmentation Challenge Results in UNCV2024

Vu, Tuan-Hung, Valle, Eduardo, Bursuc, Andrei, Kerssies, Tommie, de Geus, Daan, Dubbelman, Gijs, Qian, Long, Zhu, Bingke, Chen, Yingying, Tang, Ming, Wang, Jinqiao, Vojíř, Tomáš, Šochman, Jan, Matas, Jiří, Smith, Michael, Ferrie, Frank, Basu, Shamik, Sakaridis, Christos, Van Gool, Luc

arXiv.org Artificial IntelligenceOct-9-2024

We propose the unified BRAVO challenge to benchmark the reliability of semantic segmentation models under realistic perturbations and unknown out-of-distribution (OOD) scenarios. We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to detect object classes that are unknown during training. The challenge attracted nearly 100 submissions from international teams representing notable research institutions. The results reveal interesting insights into the importance of large-scale pre-training and minimal architectural design in developing robust and reliable semantic segmentation models.

artificial intelligence, bravo semantic segmentation challenge result, uncv2024

arXiv.org Artificial Intelligence

2409.15107

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning

Lopes, Ivan, Vu, Tuan-Hung, de Charette, Raoul

arXiv.org Artificial IntelligenceOct-8-2024

Multi-task learning has recently emerged as a promising solution for a comprehensive understanding of complex scenes. In addition to being memory-efficient, multi-task models, when appropriately designed, can facilitate the exchange of complementary signals across tasks. In this work, we jointly address 2D semantic segmentation and three geometry-related tasks: dense depth estimation, surface normal estimation, and edge estimation, demonstrating their benefits on both indoor and outdoor datasets. We propose a novel multi-task learning architecture that leverages pairwise cross-task exchange through correlation-guided attention and self-attention to enhance the overall representation learning for all tasks. We conduct extensive experiments across three multi-task setups, showing the advantages of our approach compared to competitive baselines in both synthetic and real-world benchmarks. Additionally, we extend our method to the novel multi-task unsupervised domain adaptation setting. Our code is available at https://github.com/cv-rits/DenseMTL

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.08927

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Valeo4Cast: A Modular Approach to End-to-End Forecasting

Xu, Yihong, Zablocki, Éloi, Boulch, Alexandre, Puy, Gilles, Chen, Mickael, Bartoccioni, Florent, Samet, Nermin, Siméoni, Oriane, Gidaris, Spyros, Vu, Tuan-Hung, Bursuc, Andrei, Valle, Eduardo, Marlet, Renaud, Cord, Matthieu

arXiv.org Artificial IntelligenceJun-12-2024

Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting and we use a modular approach instead. Following a recent study, we individually build and train detection, tracking, and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. Our study reveals that this simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 end-to-end Forecasting Challenge held at CVPR 2024 Workshop on Autonomous Driving (WAD), with 63.82 mAPf. We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts.

artificial intelligence, forecasting, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.08113

Country: Europe > France (0.15)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback

Reliability in Semantic Segmentation: Can We Use Synthetic Data?

Loiseau, Thibaut, Vu, Tuan-Hung, Chen, Mickael, Pérez, Patrick, Cord, Matthieu

arXiv.org Artificial IntelligenceDec-14-2023

Assessing the reliability of perception models to covariate shifts and out-of-distribution (OOD) detection is crucial for safety-critical applications such as autonomous vehicles. By nature of the task, however, the relevant data is difficult to collect and annotate. In this paper, we challenge cutting-edge generative models to automatically synthesize data for assessing reliability in semantic segmentation. By fine-tuning Stable Diffusion, we perform zero-shot generation of synthetic data in OOD domains or inpainted with OOD objects. Synthetic data is employed to provide an initial assessment of pretrained segmenters, thereby offering insights into their performance when confronted with real edge cases. Through extensive experiments, we demonstrate a high correlation between the performance on synthetic data and the performance on real OOD data, showing the validity approach. Furthermore, we illustrate how synthetic data can be utilized to enhance the calibration and OOD detection capabilities of segmenters.

artificial intelligence, machine learning, synthetic data, (17 more...)

arXiv.org Artificial Intelligence

2312.09231

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
(2 more...)

Add feedback

P{\O}DA: Prompt-driven Zero-shot Domain Adaptation

Fahes, Mohammad, Vu, Tuan-Hung, Bursuc, Andrei, Pérez, Patrick, de Charette, Raoul

arXiv.org Artificial IntelligenceAug-19-2023

Domain adaptation has been vastly investigated in computer vision but still requires access to target images at train time, which might be intractable in some uncommon conditions. In this paper, we propose the task of `Prompt-driven Zero-shot Domain Adaptation', where we adapt a model trained on a source domain using only a general description in natural language of the target domain, i.e., a prompt. First, we leverage a pretrained contrastive vision-language model (CLIP) to optimize affine transformations of source features, steering them towards the target text embedding while preserving their content and semantics. To achieve this, we propose Prompt-driven Instance Normalization (PIN). Second, we show that these prompt-driven augmentations can be used to perform zero-shot domain adaptation for semantic segmentation. Experiments demonstrate that our method significantly outperforms CLIP-based style transfer baselines on several datasets for the downstream task at hand, even surpassing one-shot unsupervised domain adaptation. A similar boost is observed on object detection and image classification. The code is available at https://github.com/astra-vision/PODA .

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.03241

Genre: Research Report (0.64)

Industry: Transportation (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Confidence Estimation via Auxiliary Models

Corbière, Charles, Thome, Nicolas, Saporta, Antoine, Vu, Tuan-Hung, Cord, Matthieu, Pérez, Patrick

arXiv.org Machine LearningDec-11-2020

Reliably quantifying the confidence of deep neural classifiers is a challenging yet fundamental requirement for deploying such models in safety-critical applications. In this paper, we introduce a novel target criterion for model confidence, namely the true class probability (TCP). We show that TCP offers better properties for confidence estimation than standard maximum class probability (MCP). Since the true class is by essence unknown at test time, we propose to learn TCP criterion from data with an auxiliary model, introducing a specific learning scheme adapted to this context. We evaluate our approach on the task of failure prediction and of self-training with pseudo-labels for domain adaptation, which both necessitate effective confidence estimates. Extensive experiments are conducted for validating the relevance of the proposed approach in each task. We study various network architectures and experiment with small and large datasets for image classification and semantic segmentation. In every tested benchmark, our approach outperforms strong baselines.

deep learning, neural network, prediction, (20 more...)

arXiv.org Machine Learning

2012.06508

Country:

Europe > France (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Automobiles & Trucks (0.46)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback