Object Detection on GPUs in 10 Minutes NVIDIA Developer Blog


Object detection remains the primary driver for applications such as autonomous driving and intelligent video analytics. Object detection applications require substantial training using vast datasets to achieve high levels of accuracy. NVIDIA GPUs excel at the parallel compute performance required to train large networks in order to generate datasets for object detection inference. This post covers what you need to get up to speed using NVIDIA GPUs to run high performance object detection pipelines quickly and efficiently. Our python application takes frames from a live video stream and performs object detection on GPUs. We use a pre-trained Single Shot Detection (SSD) model with Inception V2, apply TensorRT's optimizations, generate a runtime for our GPU, and then perform inference on the video feed to get labels and bounding boxes.

Visualizing convolutional neural networks


We then can set up our code for evaluation and training. We don't want to use our summary writer for our loss and accuracy for every time step, as this would greatly slow down the classifier. So instead, we log every five steps. While this is training, let's check out the TensorBoard results by activating TensorBoard in the terminal. We can then direct our web browser to the default TensorBoard address

Image Filters in Python


I am currently working on a computer vision project and I wanted to look into image pre-processing to help improve the machine learning models that I am planning to build. Image pre-processing involves applying image filters to an image. This article will compare a number of the most well known image filters. Image filters can be used to reduce the amount of noise in an image and to enhance the edges in an image. There are two types of noise that can be present in an image: speckle noise and salt-and-pepper noise.

Neural implants and the race to merge the human brain with Artificial Intelligence


There is a new race in Silicon Valley involving Artificial Intelligence and no it's not HealthTech, FinTech, Voice Commerce or involve Google, Facebook or Microsoft... this race involves the brain and more specifically brain-computer interfaces. This race also involves technology royalty, the US government, billion dollar defence companies, a big connection to PayPal and years of medical research to better understand the human brain and implant devices that could make a consumer brain-computer interface a reality. The race is called "Neural implants, merging the human brain with AI" So what exactly are neural implants? Brain implants, often referred to as neural implants, are technological devices that connect directly to a biological subject's brain – usually placed on the surface of the brain, or attached to the brain's cortex. A common purpose of modern brain implants and the focus of much current research is establishing a biomedical prosthesis circumventing areas in the brain that have become dysfunctional after a stroke or other head injuries.[1]

Gentle Dive into Math Behind Convolutional Neural Networks


Autonomous driving, healthcare or retail are just some of the areas where Computer Vision has allowed us to achieve things that, until recently, were considered impossible. Today the dream of a self driving car or automated grocery store does not sound so futuristic anymore. In fact, we are using Computer Vision every day -- when we unlock the phone with our face or automatically retouch photos before posting them on social media. Convolutional Neural Networks are possibly the most crucial building blocks behind this huge successes. This time we are going to broaden our understanding of how neural networks work with ideas specific to CNNs.

Expected path length on random manifolds Machine Learning

Manifold learning is one of the cornerstones of unsupervised learning. Classical methods such as Isomap [31], Locally linear embeddings [29], Laplacian eigenmaps [4] and more [30, 11] all seek a low dimensional embedding of high dimensional data that preserves prespecified aspects of data. Probabilistic methods often view the data manifold as governed by a latent variable along with a generative model that describes how the latent manifold is to be embedded in the data space. The common theme is the quest for a low dimensional representation that faithfully captures the data. Ideally, we want an operational representation, that is we want to be able to make mathematically meaningful calculations with respect to the learned representation. It has been argued [17] that a good representation should at least support the following: - Interpolation: given two points, a natural unique interpolating curve that follows the manifold should exist.

Deep Weisfeiler-Lehman Assignment Kernels via Multiple Kernel Learning Machine Learning

Kernels for structured data are commonly obtained by decomposing objects into their parts and adding up the similarities between all pairs of parts measured by a base kernel. Assignment kernels are based on an optimal bijection between the parts and have proven to be an effective alternative to the established convolution kernels. We explore how the base kernel can be learned as part of the classification problem. We build on the theory of valid assignment kernels derived from hierarchies defined on the parts. We show that the weights of this hierarchy can be optimized via multiple kernel learning. We apply this result to learn vertex similarities for the Weisfeiler-Lehman optimal assignment kernel for graph classification. We present first experimental results which demonstrate the feasibility and effectiveness of the approach.

Across-Stack Profiling and Characterization of Machine Learning Models on GPUs Machine Learning

The world sees a proliferation of machine learning/deep learning (ML) models and their wide adoption in different application domains recently. This has made the profiling and characterization of ML models an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible computing system to serve ML models with the desired latency, throughput, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack). A thorough characterization requires understanding the behavior of the model execution across the HW/SW stack levels. Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack. This paper proposes a leveled profiling design that leverages existing profiling tools to perform across-stack profiling. The design does so in spite of the profiling overheads incurred from the profiling providers. We coupled the profiling capability with an automatic analysis pipeline to systematically characterize 65 state-of-the-art ML models. Through this characterization, we show that our across-stack profiling solution provides insights (which are difficult to discern otherwise) on the characteristics of ML models, ML frameworks, and GPU hardware.

Investigating Convolutional Neural Networks using Spatial Orderness Machine Learning

Convolutional Neural Networks (CNN) have been pivotal to the success of many state-of-the-art classification problems, in a wide variety of domains (for e.g. vision, speech, graphs and medical imaging). A commonality within those domains is the presence of hierarchical, spatially agglomerative local-to-global interactions within the data. For two-dimensional images, such interactions may induce an a priori relationship between the pixel data and the underlying spatial ordering of the pixels. For instance in natural images, neighboring pixels are more likely contain similar values than non-neighboring pixels which are further apart. To that end, we propose a statistical metric called spatial orderness, which quantifies the extent to which the input data (2D) obeys the underlying spatial ordering at various scales. In our experiments, we mainly find that adding convolutional layers to a CNN could be counterproductive for data bereft of spatial order at higher scales. We also observe, quite counter-intuitively, that the spatial orderness of CNN feature maps show a synchronized increase during the intial stages of training, and validation performance only improves after spatial orderness of feature maps start decreasing. Lastly, we present a theoretical analysis (and empirical validation) of the spatial orderness of network weights, where we find that using smaller kernel sizes leads to kernels of greater spatial orderness and vice-versa.

Bayesian Generative Models for Knowledge Transfer in MRI Semantic Segmentation Problems Machine Learning

Automatic segmentation methods based on deep learning have recently demonstrated state-of-the-art performance, outperforming the ordinary methods. Nevertheless, these methods are inapplicable for small datasets, which are very common in medical problems. To this end, we propose a knowledge transfer method between diseases via the Generative Bayesian Prior network. Our approach is compared to a pre-train approach and random initialization and obtains the best results in terms of Dice Similarity Coefficient metric for the small subsets of the Brain Tumor Segmentation 2018 database (BRATS2018).