The rapid growth of data size and accessibility in recent years has instigated a shift of philosophy in algorithm design for artificial intelligence. Instead of engineering algorithms by hand, the ability to learn composable systems automatically from massive amounts of data has led to ground-breaking performance in important domains such as computer vision, speech recognition, and natural language processing. The most popular class of techniques used in these domains is called deep learning, and is seeing significant attention from industry. However, these models require incredible amounts of data and compute power to train, and are limited by the need for better hardware acceleration to accommodate scaling beyond current data and model sizes. While the current solution has been to use clusters of graphics processing units (GPU) as general purpose processors (GPGPU), the use of field programmable gate arrays (FPGA) provide an interesting alternative. Current trends in design tools for FPGAs have made them more compatible with the high-level software practices typically practiced in the deep learning community, making FPGAs more accessible to those who build and deploy models. Since FPGA architectures are flexible, this could also allow researchers the ability to explore model-level optimizations beyond what is possible on fixed architectures such as GPUs. As well, FPGAs tend to provide high performance per watt of power consumption, which is of particular importance for application scientists interested in large scale server-based deployment or resource-limited embedded applications. This review takes a look at deep learning and FPGAs from a hardware acceleration perspective, identifying trends and innovations that make these technologies a natural fit, and motivates a discussion on how FPGAs may best serve the needs of the deep learning community moving forward.
Event cameras are bio-inspired sensors that work radically different from traditional cameras. Instead of capturing images at a fixed rate, they measure per-pixel brightness changes asynchronously. This results in a stream of events, which encode the time, location and sign of the brightness changes. Event cameras posses outstanding properties compared to traditional cameras: very high dynamic range (140 dB vs. 60 dB), high temporal resolution (in the order of microseconds), low power consumption, and do not suffer from motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as high speed and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
The objective of the first CARLA autonomous driving challenge was to deploy autonomous driving systems to lead with complex traffic scenarios where all participants faced the same challenging traffic situations. According to the organizers, this competition emerges as a way to democratize and to accelerate the research and development of autonomous vehicles around the world using the CARLA simulator contributing to the development of the autonomous vehicle area. Therefore, this paper presents the architecture design for the navigation of an autonomous vehicle in a simulated urban environment that attempts to commit the least number of traffic infractions, which used as the baseline the original architecture of the platform for autonomous navigation CaRINA 2. Our agent traveled in simulated scenarios for several hours, demonstrating his capabilities, winning three out of the four tracks of the challenge, and being ranked second in the remaining track. Our architecture was made towards meeting the requirements of CARLA Autonomous Driving Challenge and has components for obstacle detection using 3D point clouds, traffic signs detection and classification which employs Convolutional Neural Networks (CNN) and depth information, risk assessment with collision detection using short-term motion prediction, decision-making with Markov Decision Process (MDP), and control using Model Predictive Control (MPC).
The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations. In this position paper, we argue that a simultaneous DNN/implementation co-design methodology, named Neural Architecture and Implementation Search (NAIS), deserves more research attention to boost the development productivity and efficiency of both DNN models and implementation optimization. We propose a stylized design methodology that can drastically cut down the search cost while preserving the quality of the end solution.As an illustration, we discuss this DNN/implementation methodology in the context of both FPGAs and GPUs. We take autonomous driving as a key use case as it is one of the most demanding areas for high quality AI algorithms and accelerators. We discuss how such a co-design methodology can impact the autonomous driving industry significantly. We identify several research opportunities in this exciting domain.
As the foundation of driverless vehicle and intelligent robots, Simultaneous Localization and Mapping(SLAM) has attracted much attention these days. However, non-geometric modules of traditional SLAM algorithms are limited by data association tasks and have become a bottleneck preventing the development of SLAM. To deal with such problems, many researchers seek to Deep Learning for help. But most of these studies are limited to virtual datasets or specific environments, and even sacrifice efficiency for accuracy. Thus, they are not practical enough. We propose DF-SLAM system that uses deep local feature descriptors obtained by the neural network as a substitute for traditional hand-made features. Experimental results demonstrate its improvements in efficiency and stability. DF-SLAM outperforms popular traditional SLAM systems in various scenes, including challenging scenes with intense illumination changes. Its versatility and mobility fit well into the need for exploring new environments. Since we adopt a shallow network to extract local descriptors and remain others the same as original SLAM systems, our DF-SLAM can still run in real-time on GPU.