Collaborating Authors

Top 10 Popular Datasets For Autonomous Driving Projects


Since a few years, organisations have been investing heavily in the autonomous driving space. The reason behind this spending is expected to reshape the ways of the transport network in a positive way. According to reports, the global autonomous vehicle market is expected to witness an accelerated CAGR of 62.86% to reach $41.24 billion by 2024. In this article, we list down ten popular datasets for autonomous driving projects. The list is in alphabetical order.

D$^2$-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios Machine Learning

Driving datasets accelerate the development of intelligent driving and related computer vision technologies, while substantial and detailed annotations serve as fuels and powers to boost the efficacy of such datasets to improve learning-based models. We propose D$^2$-City, a large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi's platform. D$^2$-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China. We also provide bounding boxes and tracking annotations of 12 classes of objects in all frames of 1000 videos and detection annotations on keyframes for the remainder of the videos. Compared with existing datasets, D$^2$-City features data in varying weather, road, and traffic conditions and a huge amount of elaborate detection and tracking annotations. By bringing a diverse set of challenging cases to the community, we expect the D$^2$-City dataset will advance the perception and related areas of intelligent driving.

nuScenes: A multimodal dataset for autonomous driving Machine Learning

Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image-based benchmark datasets have driven the development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We also define a new metric for 3D detection which consolidates the multiple aspects of the detection task: classification, localization, size, orientation, velocity and attribute estimation. We provide careful dataset analysis as well as baseline performance for lidar and image based detection methods. Data, development kit, and more information are available at

Scalability in Perception for Autonomous Driving: Waymo Open Dataset Machine Learning

The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at

2019 In Review: 10 Open-Sourced AI Datasets


High-quality data is the fuel that keeps the AI engine running -- and the machine learning community can't get enough of it. In the conclusion to our year-end series, Synced spotlights ten datasets that were open-sourced in 2019. Waymo, the self-driving technology development company and Alphabet subsidiary, has been relatively protective of its technology and data since its establishment in 2009. This August however it released the Waymo Open Dataset, a high-quality multimodal sensor dataset for autonomous driving. Waymo principal scientist and head of research Drago Anguelov says the data set is "one of the largest, riches and most diverse self-driving data sets ever released for research."