Goto

Collaborating Authors

 Africa


Intel-powered camera uses AI to protect endangered African wildlife

Engadget

Technology is already in use to help stop poachers. However, it's frequently limited to monitoring poachers when they're already in shooting range, or after the fact. The non-profit group Resolve vows to do better -- it recently developed a newer version of its TrailGuard camera that uses AI to spot poachers in Africa before they can threaten an endangered species. It uses an Intel-made computer vision processor (the Movidius Myriad 2) that can detect animals, humans and vehicles in real-time, giving park rangers a chance to intercept poachers before it's too late. The technology promises to not only be more effective than previous cameras, but more efficient.


Mapping Informal Settlements in Developing Countries using Machine Learning and Low Resolution Multi-spectral Data

arXiv.org Machine Learning

Informal settlements are home to the most socially and economically vulnerable people on the planet. In order to deliver effective economic and social aid, non-government organizations (NGOs), such as the United Nations Children's Fund (UNICEF), require detailed maps of the locations of informal settlements. However, data regarding informal and formal settlements is primarily unavailable and if available is often incomplete. This is due, in part, to the cost and complexity of gathering data on a large scale. An additional complication is that the definition of an informal settlement is also very broad, which makes it a non-trivial task to collect data. This also makes it challenging to teach a machine what to look for. Due to these challenges we provide three contributions in this work. 1) A brand new machine learning data-set, purposely developed for informal settlement detection that contains a series of low and very-high resolution imagery, with accompanying ground truth annotations marking the locations of known informal settlements. 2) We demonstrate that it is possible to detect informal settlements using freely available low-resolution (LR) data, in contrast to previous studies that use very-high resolution (VHR) satellite and aerial imagery, which is typically cost-prohibitive for NGOs. 3) We demonstrate two effective classification schemes on our curated data set, one that is cost-efficient for NGOs and another that is cost-prohibitive for NGOs, but has additional utility. We integrate these schemes into a semi-automated pipeline that converts either a LR or VHR satellite image into a binary map that encodes the locations of informal settlements. We evaluate and compare our methods.


Low-Cost Device Prototype for Automatic Medical Diagnosis Using Deep Learning Methods

arXiv.org Machine Learning

This paper introduces a novel low-cost device prototype for the automatic diagnosis of diseases, utilizing inputted symptoms and personal background. The engineering goal is to solve the problem of limited healthcare access with a single device. Diagnosing diseases automatically is an immense challenge, owing to their variable properties and symptoms. On the other hand, Neural Networks have developed into a powerful tool in the field of machine learning, one that is showing to be extremely promising at computing diagnosis even with inconsistent variables. In this research, a cheap device was created to allow for straightforward diagnosis and treatment of human diseases. By utilizing Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), outfitted on a Raspberry Pi Zero processor ($5), the device is able to detect up to 1537 different diseases and conditions and utilize a CNN for on-device visual diagnostics. The user can input the symptoms using the buttons on the device and can take pictures using the same mechanism. The algorithm processes inputted symptoms, providing diagnosis and possible treatment options for common conditions. The purpose of this work was to be able to diagnose diseases through an affordable processor with high accuracy, as it is currently achieving an accuracy of 90% for Top-5 symptom-based diagnoses, and 91% for visual skin diseases. The NNs achieve performance far above any other tested system, and its efficiency and ease of use will prove it to be a helpful tool for people around the world. This device could potentially provide low-cost universal access to vital diagnostics and treatment options.


Are we done with object recognition? The iCub robot's perspective

arXiv.org Artificial Intelligence

We report on an extensive study of the benefits and limitations of current deep learning approaches to object recognition in robot vision scenarios, introducing a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a natural human-robot interaction setting to design a data-acquisition protocol for visual object recognition on the iCub humanoid robot. Analyzing the performance of off-the-shelf models trained off-line on large-scale image retrieval datasets, we show the necessity for knowledge transfer. We evaluate different ways in which this last step can be done, and identify the major bottlenecks affecting robotic scenarios. By studying both object categorization and identification problems, we highlight key differences between object recognition in robotics applications and in image retrieval tasks, for which the considered deep learning approaches have been originally designed. In a nutshell, our results confirm the remarkable improvements yield by deep learning in this setting, while pointing to specific open challenges that need be addressed for seamless deployment in robotics.


Prediction of multi-dimensional spatial variation data via Bayesian tensor completion

arXiv.org Machine Learning

This paper presents a multi-dimensional computational method to predict the spatial variation data inside and across multiple dies of a wafer. This technique is based on tensor computation. A tensor is a high-dimensional generalization of a matrix or a vector. By exploiting the hidden low-rank property of a high-dimensional data array, the large amount of unknown variation testing data may be predicted from a few random measurement samples. The tensor rank, which decides the complexity of a tensor representation, is decided by an available variational Bayesian approach. Our approach is validated by a practical chip testing data set, and it can be easily generalized to characterize the process variations of multiple wafers. Our approach is more efficient than the previous virtual probe techniques in terms of memory and computational cost when handling high-dimensional chip testing data.


Adversarial Robustness May Be at Odds With Simplicity

arXiv.org Machine Learning

Current techniques in machine learning are so far are unable to learn classifiers that are robust to adversarial perturbations. However, they are able to learn non-robust classifiers with very high accuracy, even in the presence of random perturbations. Towards explaining this gap, we highlight the hypothesis that $\textit{robust classification may require more complex classifiers (i.e. more capacity) than standard classification.}$ In this note, we show that this hypothesis is indeed possible, by giving several theoretical examples of classification tasks and sets of "simple" classifiers for which: (1) There exists a simple classifier with high standard accuracy, and also high accuracy under random $\ell_\infty$ noise. (2) Any simple classifier is not robust: it must have high adversarial loss with $\ell_\infty$ perturbations. (3) Robust classification is possible, but only with more complex classifiers (exponentially more complex, in some examples). Moreover, $\textit{there is a quantitative trade-off between robustness and standard accuracy among simple classifiers.}$ This suggests an alternate explanation of this phenomenon, which appears in practice: the tradeoff may occur not because the classification task inherently requires such a tradeoff (as in [Tsipras-Santurkar-Engstrom-Turner-Madry `18]), but because the structure of our current classifiers imposes such a tradeoff.


Elimination of All Bad Local Minima in Deep Learning

arXiv.org Machine Learning

In this paper, we theoretically prove that we can eliminate all suboptimal local minima by adding one neuron per output unit to any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function. At every local minimum of any deep neural network with added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network. The effects of the added neurons are proven to automatically vanish at every local minimum. Unlike many related results in the literature, our theoretical results are directly applicable to common deep learning tasks because the results only rely on the assumptions that automatically hold in the common tasks. Moreover, we discuss several limitations in eliminating the suboptimal local minima in this manner by providing additional theoretical results and several examples.


2018 in Review

#artificialintelligence

This post reviews my experiences in 2018. I welcomed the year in the gorgeous beaches of Goa and am now ending it in the wilderness of South Africa. Joining NVIDIA: I joined NVIDIA in September and started a new research group on core AI/ML. I am hiring at full pace and have started many new projects. Honor of being the youngest named chair professor at Caltech: I was one of the six faculty members that Caltech recognized during the 2017-18 academic year.


Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Neural Information Processing Systems

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e. gradient descent with infinitesimal step size) effectively enforces the differences between squared norms across different layers to remain invariant without any explicit regularization. This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization. Inspired by our findings for gradient flow, we prove that gradient descent with step sizes $\eta_t=O(t^{−(1/2+\delta)}) (0<\delta\le1/2)$ automatically balances two low-rank factors and converges to a bounded global optimum. Furthermore, for rank-1 asymmetric matrix factorization we give a finer analysis showing gradient descent with constant step size converges to the global minimum at a globally linear rate. We believe that the idea of examining the invariance imposed by first order algorithms in learning homogeneous models could serve as a fundamental building block for studying optimization for learning deep models.


Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Neural Information Processing Systems

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^{-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^{-1})$. These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.