Collaborating Authors

Campus artificial intelligence researchers aim to improve self-driving cars


The Berkeley Artificial Intelligence Research Lab, or BAIR, released a study on BDD100K -- a driving database that can be used to train algorithms of self-driving cars -- May 12. The data set can be used to train self-driving cars' artificial intelligence programs, according to BAIR's website. The study concluded that the data set can help researchers understand how different scenarios affect current self-driving car programs. A study by the research team that created the data set described two contributions to self-driving cars, one of which is the data set and the other its video annotation system. According to BAIR's website, BDD100K is "the largest and most diverse driving video dataset," containing 100,000 driving clips.

Perspective, Survey and Trends: Public Driving Datasets and Toolsets for Autonomous Driving Virtual Test Artificial Intelligence

Owing to the merits of early safety and reliability guarantee, autonomous driving virtual testing has recently gains increasing attention compared with closed-loop testing in real scenarios. Although the availability and quality of autonomous driving datasets and toolsets are the premise to diagnose the autonomous driving system bottlenecks and improve the system performance, due to the diversity and privacy of the datasets and toolsets, collecting and featuring the perspective and quality of them become not only time-consuming but also increasingly challenging. This paper first proposes a Systematic Literature review approach for Autonomous driving tests (SLA), then presents an overview of existing publicly available datasets and toolsets from 2000 to 2020. Quantitative findings with the scenarios concerned, perspectives and trend inferences and suggestions with 35 automated driving test tool sets and 70 test data sets are also presented. To the best of our knowledge, we are the first to perform such recent empirical survey on both the datasets and toolsets using a SLA based survey approach. Our multifaceted analyses and new findings not only reveal insights that we believe are useful for system designers, practitioners and users, but also can promote more researches on a systematic survey analysis in autonomous driving surveys on dataset and toolsets.

The ApolloScape Open Dataset for Autonomous Driving and its Application


Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as 3D-2D segment labeling tools, active labelling in videos etc. Depend on ApolloScape, we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision.

Roboticists go off road to compile data that could train self-driving ATVs: TartanDrive dataset likely largest for off-road environments


They drove the heavily instrumented ATV aggressively at speeds up to 30 miles an hour. They slid through turns, took it up and down hills, and even got it stuck in the mud -- all while gathering data such as video, the speed of each wheel and the amount of suspension shock travel from seven types of sensors. The resulting dataset, called TartanDrive, includes about 200,000 of these real-world interactions. The researchers believe the data is the largest real-world, multimodal, off-road driving dataset, both in terms of the number of interactions and types of sensors. The five hours of data could be useful for training a self-driving vehicle to navigate off road.