AITopics | Gilles, Thomas

Collaborating Authors

Gilles, Thomas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeTra: A Unified Model for Object Detection and Trajectory Forecasting

Casas, Sergio, Agro, Ben, Mao, Jiageng, Gilles, Thomas, Cui, Alexander, Li, Thomas, Urtasun, Raquel

arXiv.org Artificial IntelligenceJun-13-2024

The tasks of object detection and trajectory forecasting play a crucial role in understanding the scene for autonomous driving. These tasks are typically executed in a cascading manner, making them prone to compounding errors. Furthermore, there is usually a very thin interface between the two tasks, creating a lossy information bottleneck. To address these challenges, our approach formulates the union of the two tasks as a trajectory refinement problem, where the first pose is the detection (current time), and the subsequent poses are the waypoints of the multiple forecasts (future time). To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects directly from LiDAR point clouds and high-definition maps. We call this model DeTra, short for object Detection and Trajectory forecasting. In our experiments, we observe that \ourmodel{} outperforms the state-of-the-art on Argoverse 2 Sensor and Waymo Open Dataset by a large margin, across a broad range of metrics. Last but not least, we perform extensive ablation studies that show the value of refinement for this task, that every proposed component contributes positively to its performance, and that key design choices were made.

artificial intelligence, forecasting, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.04426

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation (0.48)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Agro, Ben, Sykora, Quinlan, Casas, Sergio, Gilles, Thomas, Urtasun, Raquel

arXiv.org Artificial IntelligenceJun-12-2024

Perceiving the world and forecasting its future state is a critical task for self-driving. Supervised approaches leverage annotated object labels to learn a model of the world -- traditionally with object detections and trajectory predictions, or temporal bird's-eye-view (BEV) occupancy fields. However, these annotations are expensive and typically limited to a set of predefined categories that do not cover everything we might encounter on the road. Instead, we learn to perceive and forecast a continuous 4D (spatio-temporal) occupancy field with self-supervision from LiDAR data. This unsupervised world model can be easily and effectively transferred to downstream tasks. We tackle point cloud forecasting by adding a lightweight learned renderer and achieve state-of-the-art performance in Argoverse 2, nuScenes, and KITTI. To further showcase its transferability, we fine-tune our model for BEV semantic occupancy forecasting and show that it outperforms the fully supervised state-of-the-art, especially when labeled data is scarce. Finally, when compared to prior state-of-the-art on spatio-temporal geometric occupancy prediction, our 4D world model achieves a much higher recall of objects from classes relevant to self-driving.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2406.08691

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

RMP: A Random Mask Pretrain Framework for Motion Prediction

Yang, Yi, Zhang, Qingwen, Gilles, Thomas, Batool, Nazre, Folkesson, John

arXiv.org Artificial IntelligenceSep-16-2023

As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving. In this paper, we propose a framework to formalize the pretraining task for trajectory prediction of traffic participants. Within our framework, inspired by the random masked model in natural language processing (NLP) and computer vision (CV), objects' positions at random timesteps are masked and then filled in by the learned neural network (NN). By changing the mask profile, our framework can easily switch among a range of motion-related tasks. We show that our proposed pretraining framework is able to deal with noisy inputs and improves the motion prediction accuracy and miss rate, especially for objects occluded over time by evaluating it on Argoverse and NuScenes datasets.

machine learning, natural language, prediction, (18 more...)

arXiv.org Artificial Intelligence

2309.08989

Country:

Europe > Sweden (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.46)

Industry: Transportation > Ground > Road (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MBAPPE: MCTS-Built-Around Prediction for Planning Explicitly

Chekroun, Raphael, Gilles, Thomas, Toromanoff, Marin, Hornauer, Sascha, Moutarde, Fabien

arXiv.org Artificial IntelligenceSep-15-2023

We propose a framework that combines MCTS with supervised learning, enabling the autonomous vehicle to effectively navigate through diverse scenarios. Experimental results demonstrate the effectiveness and adaptability of our approach, showcasing improved real-time decision-making and collision avoidance. This paper contributes to the field by providing a robust solution for motion planning in autonomous driving systems, enhancing their explainability and reliability.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2309.08452

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.69)
Information Technology > Robotics & Automation (0.51)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

ImPosing: Implicit Pose Encoding for Efficient Visual Localization

Moreau, Arthur, Gilles, Thomas, Piasco, Nathan, Tsishkou, Dzmitry, Stanciulescu, Bogdan, de La Fortelle, Arnaud

arXiv.org Artificial IntelligenceOct-28-2022

We propose a novel learning-based formulation for visual localization of vehicles that can operate in real-time in city-scale environments. Visual localization algorithms determine the position and orientation from which an image has been captured, using a set of geo-referenced images or a 3D scene representation. Our new localization paradigm, named Implicit Pose Encoding (ImPosing), embeds images and camera poses into a common latent representation with 2 separate neural networks, such that we can compute a similarity score for each image-pose pair. By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but incrementally refined. Very large environments force competitors to store gigabytes of map data, whereas our method is very compact independently of the reference database size. In this paper, we describe how to effectively optimize our learned modules, how to combine them to achieve real-time localization, and demonstrate results on diverse large scale scenarios that significantly outperform prior work in accuracy and computational efficiency.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2205.02638

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)

Add feedback

Multi-Modal Simultaneous Forecasting of Vehicle Position Sequences using Social Attention

Mercat, Jean, Gilles, Thomas, Zoghby, Nicole El, Sandou, Guillaume, Beauvois, Dominique, Gil, Guillermo Pita

arXiv.org Artificial IntelligenceOct-8-2019

Figure 1: A driving scene top view representation with superposed forecast probability density functions represented in blue shades in log scale. The forcasting model uses the past trajectories plotted in gray as input. Abstract -- V ehicle trajectory forecasting models use a wide variety of frameworks for interaction and multi-modality. They rely on various representations of the road scene and definitions of maneuvers. In this paper we present a simple model that simultaneously forecasts each vehicle position on a road scene as a sequence of multi-modal probability density functions. This relies solely on vehicle position tracks and does not define maneuvers. We produce an easily extendable model that combines these predictive capabilities while surpassing state-of-the-art results. Its architecture uses multi-head attention to account for complete interactions between all vehicles, and long short-term memory (LSTM) layers for encoding and forecasting. I. INTRODUCTION Automation of driving tasks aims for safety and comfort improvements. For that purpose, most Autonomous Driving (AD) system relies on the anticipation of the traffic scene movements.

deep learning, neural network, vehicle, (20 more...)

arXiv.org Artificial Intelligence

1910.0365

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback