AITopics | Shotton, Jamie

Collaborating Authors

Shotton, Jamie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving

Russell, Lloyd, Hu, Anthony, Bertoni, Lorenzo, Fedoseev, George, Shotton, Jamie, Arani, Elahe, Corrado, Gianluca

arXiv.org Artificial IntelligenceMar-26-2025

Generative models offer a scalable and flexible paradigm for simulating complex environments, yet current approaches fall short in addressing the domain-specific requirements of autonomous driving - such as multi-agent interactions, fine-grained control, and multi-camera consistency. We introduce GAIA-2, Generative AI for Autonomy, a latent diffusion world model that unifies these capabilities within a single generative framework. GAIA-2 supports controllable video generation conditioned on a rich set of structured inputs: ego-vehicle dynamics, agent configurations, environmental factors, and road semantics. It generates high-resolution, spatiotemporally consistent multi-camera videos across geographically diverse driving environments (UK, US, Germany). The model integrates both structured conditioning and external latent embeddings (e.g., from a proprietary driving model) to facilitate flexible and semantically grounded scene synthesis. Through this integration, GAIA-2 enables scalable simulation of both common and rare driving scenarios, advancing the use of generative world models as a core tool in the development of autonomous systems. Videos are available at https://wayve.ai/thinking/gaia-2.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.20523

Country: Europe > Germany (0.24)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving

Zürn, Jannik, Gladkov, Paul, Dudas, Sofía, Cotter, Fergal, Toteva, Sofi, Shotton, Jamie, Simaiaki, Vasiliki, Mohan, Nikhil

arXiv.org Artificial IntelligenceJul-11-2024

We present WayveScenes101, a dataset designed to help the community advance the state of the art in novel view synthesis that focuses on challenging driving scenes containing many dynamic and deformable elements with changing geometry and texture. The dataset comprises 101 driving scenes across a wide range of environmental conditions and driving scenarios. The dataset is designed for benchmarking reconstructions on in-the-wild driving scenes, with many inherent challenges for scene reconstruction methods including image glare, rapid exposure changes, and highly dynamic scenes with significant occlusion. Along with the raw images, we include COLMAP-derived camera poses in standard data formats. We propose an evaluation protocol for evaluating models on held-out camera views that are off-axis from the training views, specifically testing the generalisation capabilities of methods. Finally, we provide detailed metadata for all scenes, including weather, time of day, and traffic conditions, to allow for a detailed model performance breakdown across scene characteristics. Dataset and code are available at https://github.com/wayveai/

artificial intelligence, dataset, synthesis, (10 more...)

arXiv.org Artificial Intelligence

2407.0828

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (0.66)
Information Technology > Robotics & Automation (0.42)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.52)

Add feedback

CarLLaVA: Vision language models for camera-only closed-loop driving

Renz, Katrin, Chen, Long, Marcu, Ana-Maria, Hünermann, Jan, Hanotte, Benoit, Karnsund, Alice, Shotton, Jamie, Arani, Elahe, Sinavski, Oleg

arXiv.org Artificial IntelligenceJun-14-2024

In this technical report, we present CarLLaVA, a Vision Language Model (VLM) for autonomous driving, developed for the CARLA Autonomous Driving Challenge 2.0. CarLLaVA uses the vision encoder of the LLaVA VLM and the LLaMA architecture as backbone, achieving state-of-the-art closed-loop driving performance with only camera input and without the need for complex or expensive labels. Additionally, we show preliminary results on predicting language commentary alongside the driving output. CarLLaVA uses a semi-disentangled output representation of both path predictions and waypoints, getting the advantages of the path for better lateral control and the waypoints for better longitudinal control. We propose an efficient training recipe to train on large driving datasets without wasting compute on easy, trivial data. CarLLaVA ranks 1st place in the sensor track of the CARLA Autonomous Driving Challenge 2.0 outperforming the previous state of the art by 458% and the best concurrent submission by 32.6%.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.10165

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.61)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.78)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

LangProp: A code optimization framework using Language Models applied to driving

Ishida, Shu, Corrado, Gianluca, Fedoseev, George, Yeo, Hudson, Russell, Lloyd, Shotton, Jamie, Henriques, João F., Hu, Anthony

arXiv.org Artificial IntelligenceJan-18-2024

LangProp is a framework for iteratively optimizing code generated by large language models (LLMs) in a supervised/reinforcement learning setting. While LLMs can generate sensible solutions zero-shot, the solutions are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code performance on a dataset of input-output pairs, as well as catches any exceptions, and feeds the results back to the LLM in the training loop, so that the LLM can iteratively improve the code it generates. By adopting a metric- and data-driven training paradigm for this code optimization procedure, one could easily adapt findings from traditional machine learning techniques such as imitation learning, DAgger, and reinforcement learning. We demonstrate the first proof of concept of automated code optimization for autonomous driving in CARLA, showing that LangProp can generate interpretable and transparent driving policies that can be verified and improved in a metric- and data-driven way. Our code will be open-sourced and is available at https://github.com/shuishida/LangProp.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.10314

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LingoQA: Video Question Answering for Autonomous Driving

Marcu, Ana-Maria, Chen, Long, Hünermann, Jan, Karnsund, Alice, Hanotte, Benoit, Chidananda, Prajwal, Nair, Saurabh, Badrinarayanan, Vijay, Kendall, Alex, Shotton, Jamie, Sinavski, Oleg

arXiv.org Artificial IntelligenceDec-21-2023

Autonomous driving has long faced a challenge with public acceptance due to the lack of explainability in the decision-making process. Video question-answering (QA) in natural language provides the opportunity for bridging this gap. Nonetheless, evaluating the performance of Video QA models has proved particularly tough due to the absence of comprehensive benchmarks. To fill this gap, we introduce LingoQA, a benchmark specifically for autonomous driving Video QA. The LingoQA trainable metric demonstrates a 0.95 Spearman correlation coefficient with human evaluations. We introduce a Video QA dataset of central London consisting of 419k samples that we release with the paper. We establish a baseline vision-language model and run extensive ablation studies to understand its performance.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2312.14115

Country:

Europe (0.28)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)

Add feedback

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Chen, Long, Sinavski, Oleg, Hünermann, Jan, Karnsund, Alice, Willmott, Andrew James, Birch, Danny, Maund, Daniel, Shotton, Jamie

arXiv.org Artificial IntelligenceOct-13-2023

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2310.01957

Country: Europe > United Kingdom > England > Greater London > London (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GAIA-1: A Generative World Model for Autonomous Driving

Hu, Anthony, Russell, Lloyd, Yeo, Hudson, Murez, Zak, Fedoseev, George, Kendall, Alex, Shotton, Jamie, Corrado, Gianluca

arXiv.org Artificial IntelligenceSep-29-2023

Autonomous driving promises transformative improvements to transportation, but building systems capable of safely navigating the unstructured complexity of real-world scenarios remains challenging. A critical problem lies in effectively predicting the various potential outcomes that may emerge in response to the vehicle's actions as the world evolves. To address this challenge, we introduce GAIA-1 ('Generative AI for Autonomy'), a generative world model that leverages video, text, and action inputs to generate realistic driving scenarios while offering fine-grained control over ego-vehicle behavior and scene features. Our approach casts world modeling as an unsupervised sequence modeling problem by mapping the inputs to discrete tokens, and predicting the next token in the sequence. Emerging properties from our model include learning high-level structures and scene dynamics, contextual awareness, generalization, and understanding of geometry. The power of GAIA-1's learned representation that captures expectations of future events, combined with its ability to generate realistic samples, provides new possibilities for innovation in the field of autonomy, enabling enhanced and accelerated training of autonomous driving technology.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2309.1708

Country: Europe (0.14)

Genre:

Research Report (0.40)
Instructional Material (0.34)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (0.81)
Automobiles & Trucks (0.81)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Model-Based Imitation Learning for Urban Driving

Hu, Anthony, Corrado, Gianluca, Griffiths, Nicolas, Murez, Zak, Gurau, Corina, Yeo, Hudson, Kendall, Alex, Cipolla, Roberto, Shotton, Jamie

arXiv.org Artificial IntelligenceNov-3-2022

An accurate model of the environment and the dynamic agents acting in it offers great potential for improving motion planning. We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving. Our method leverages 3D geometry as an inductive bias and learns a highly compact latent space directly from high-resolution videos of expert demonstrations. Our model is trained on an offline corpus of urban driving data, without any online interaction with the environment. MILE improves upon prior state-of-the-art by 31% in driving score on the CARLA simulator when deployed in a completely new town and new weather conditions. Our model can predict diverse and plausible states and actions, that can be interpretably decoded to bird's-eye view semantic segmentation. Further, we demonstrate that it can execute complex driving manoeuvres from plans entirely predicted in imagination. Our approach is the first camera-only method that models static scene, dynamic scene, and ego-behaviour in an urban driving environment. The code and model weights are available at https://github.com/wayveai/mile.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2210.07729

Genre: Research Report (0.81)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Decision Jungles: Compact and Rich Models for Classification

Shotton, Jamie, Sharp, Toby, Kohli, Pushmeet, Nowozin, Sebastian, Winn, John, Criminisi, Antonio

Neural Information Processing SystemsDec-31-2013

Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization.

artificial intelligence, decision tree learning, node, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Texas > Travis County > Austin (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback