With the rise of autonomous vehicles, smart video surveillance, facial detection and various people counting applications, fast and accurate object detection systems are rising in demand. These systems involve not only recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. This makes object detection a significantly harder task than its traditional computer vision predecessor, image classification.
The Internet of Things (IoT) has sparked the proliferation of connected devices. These devices, which house sensors to collect data of the day-to-day activities or monitoring purposes, are embedded with microcontrollers and microprocessors chips. These chips are mounted based on the data sensor needed to complete an assigned task. So we don't have a one processor fits all architecture. For example, some devices will perform a limited amount of processing on data sets such as temperature, humidity, pressure, or gravity; more complicated systems, however, will need to handle (multiple) high-resolution sound or video streams.
If I asked you to name the objects in the picture below, you would probably come up with a list of words such as "tablecloth, basket, grass, boy, girl, man, woman, orange juice bottle, tomatoes, lettuce, disposable plates…" without thinking twice. Now, if I told you to describe the picture below, you would probably say, "It's the picture of a family picnic" again without giving it a second thought. Those are two very easy tasks that any person with below-average intelligence and above the age of six or seven could accomplish. However, in the background, a very complicated process takes place. The human vision is a very intricate piece of organic technology that involves our eyes and visual cortex, but also takes into account our mental models of objects, our abstract understanding of concepts and our personal experiences through billions and trillions of interactions we've made with the world in our lives.
Computer Vision also referred as Vision is the recent cutting edge field within computer science that deals with enabling computers, devices or machines, in general, to see, understand, interpret or manipulate what is being seen. Computer Vision technology implements deep learning techniques and in few cases also employs Natural Language Processing techniques as a natural progression of steps to analyze extracted text from images. With all the advancements of deep learning, building functions like image classification, object detection, tracking, and image manipulation has become more simpler and accurate thus leading way to exploring more complex autonomous applications like self-driving cars, humanoids or drones. With deep learning, we can now manipulate images, for example superimpose Tom Cruise's features onto another face. Or convert a picture into a sketch mode or water color painting mode.
In this post, we are going to see the basics of object detection in the computer vision, basics of famous object detection system YOLO (You Only Look once), and the installation procedure of the latest YOLO v4 in Ubuntu. This post is indented for absolute beginners who don't have much idea on implementing an objection detection system in Ubuntu Before discussing the object detection concepts, it will be good to start with the following concepts in computer vision. Object classification: This technique predicts the probability of different object categories( car, dog, cat, etc.) in an image, it essentially answers the question "What is in the picture?". It can only predict one category for one image. Object localization: This method can predict the probability of an object in the image along with its location in the image. This technique basically answers "What is in the picture and where it is?".
Autonomous vehicles need to have a semantic understanding of the three-dimensional world around them in order to reason about their environment. State of the art methods use deep neural networks to predict semantic classes for each point in a LiDAR scan. A powerful and efficient way to process LiDAR measurements is to use two-dimensional, image-like projections. In this work, we perform a comprehensive experimental study of image-based semantic segmentation architectures for LiDAR point clouds. We demonstrate various techniques to boost the performance and to improve runtime as well as memory constraints. First, we examine the effect of network size and suggest that much faster inference times can be achieved at a very low cost to accuracy. Next, we introduce an improved point cloud projection technique that does not suffer from systematic occlusions. We use a cyclic padding mechanism that provides context at the horizontal field-of-view boundaries. In a third part, we perform experiments with a soft Dice loss function that directly optimizes for the intersection-over-union metric. Finally, we propose a new kind of convolution layer with a reduced amount of weight-sharing along one of the two spatial dimensions, addressing the large difference in appearance along the vertical axis of a LiDAR scan. We propose a final set of the above methods with which the model achieves an increase of 3.2% in mIoU segmentation performance over the baseline while requiring only 42% of the original inference time.
Applying machine learning techniques to security solutions is one of the current AI trends. This article will cover the approach to developing OCR-based software using deep learning algorithms. This software can be used to analyze and process identification such as a US driver's license as part of a security system for verifying identity. OCR (Optical Character Recognition) technology is already used by machine learning companies for business processes automation and optimization, with use cases ranging from Dropbox using it to parse through pictures to Google Street view identifying different street signs to searching through text messages and translating text in real time. In this particular case, OCR can be used as part of an automated biometric verification system.
The rapid growth of real-time huge data capturing has pushed the deep learning and data analytic computing to the edge systems. Real-time object recognition on the edge is one of the representative deep neural network (DNN) powered edge systems for real-world mission-critical applications, such as autonomous driving and augmented reality. While DNN powered object detection edge systems celebrate many life-enriching opportunities, they also open doors for misuse and abuse. This paper presents three Targeted adversarial Objectness Gradient attacks, coined as TOG, which can cause the state-of-the-art deep object detection networks to suffer from object-vanishing, object-fabrication, and object-mislabeling attacks. We also present a universal objectness gradient attack to use adversarial transferability for black-box attacks, which is effective on any inputs with negligible attack time cost, low human perceptibility, and particularly detrimental to object detection edge systems. We report our experimental measurements using two benchmark datasets (PASCAL VOC and MS COCO) on two state-of-the-art detection algorithms (YOLO and SSD). The results demonstrate serious adversarial vulnerabilities and the compelling need for developing robust object detection systems.
Predicting the future is a crucial first step to effective control, since systems that can predict the future can select plans that lead to desired outcomes. In this work, we study the problem of future prediction at the level of 3D scenes, represented by point clouds captured by a LiDAR sensor, i.e., directly learning to forecast the evolution of >100,000 points that comprise a complete scene. We term this Scene Point Cloud Sequence Forecasting (SPCSF). By directly predicting the densest-possible 3D representation of the future, the output contains richer information than other representations such as future object trajectories. We design a method, SPCSFNet, evaluate it on the KITTI and nuScenes datasets, and find that it demonstrates excellent performance on the SPCSF task. To show that SPCSF can benefit downstream tasks such as object trajectory forecasting, we present a new object trajectory forecasting pipeline leveraging SPCSFNet. Specifically, instead of forecasting at the object level as in conventional trajectory forecasting, we propose to forecast at the sensor level and then apply detection and tracking on the predicted sensor data. As a result, our new pipeline can remove the need of object trajectory labels and enable large-scale training with unlabeled sensor data. Surprisingly, we found our new pipeline based on SPCSFNet was able to outperform the conventional pipeline using state-of-the-art trajectory forecasting methods, all of which require future object trajectory labels. Finally, we propose a new evaluation procedure and two new metrics to measure the end-to-end performance of the trajectory forecasting pipeline. Our code will be made publicly available at https://github.com/xinshuoweng/SPCSF
Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis in locations close to where data is captured based on artificial intelligence. The aim of edge intelligence is to enhance the quality and speed of data processing and protect the privacy and security of the data. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this paper, we present a thorough and comprehensive survey on the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, namely edge caching, edge training, edge inference, and edge offloading, based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare and analyse the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, etc. This survey article provides a comprehensive introduction to edge intelligence and its application areas. In addition, we summarise the development of the emerging research field and the current state-of-the-art and discuss the important open issues and possible theoretical and technical solutions.