Enabling Deep Learning on IoT Devices

IEEE Computer

Deep learning can enable Internet of Things (IoT) devices to interpret unstructured multimedia data and intelligently react to both user and environmental events but has demanding performance and power requirements. The authors explore two ways to successfully integrate deep learning with low-power IoT products.

The Perils of Letting Machines into the Hive Mind - Issue 52: The Hive


In the preface to Saint Joan, his play about Joan of Arc, the teenager whose visions of saints and archangels stirred soldiers into battle early in the 15th century, George Bernard Shaw makes a surprisingly compelling argument that following Joan of Arc's mystical visions was at least as rational as following a modern-day general into today's battlefield full of highly technological and incomprehensible weapons of war. If we don't know calculus, we can't understand the beauty of imagining time disappearing by letting it shrink into a moment and how that relates to the tangent of a curve. If you have used the Internet recently to work on a task, you'd find it hard to assess your ability as an individual to perform the task since it is so intertwined with the contribution of the Internet. In another study we asked people to search the Internet for the answers to simple questions about finance, like "What is a stock share?"

Moving Beyond the Turing Test with the Allen AI Science Challenge

Communications of the ACM

The competition aimed to assess the state of the art in AI systems utilizing natural language understanding and knowledge-based reasoning; how accurately the participants' models could answer the exam questions would serve as an indicator of how far the field has come in these areas. A week before the end of the competition, we provided the final test set of 21,298 questions (including the validation set) to participants to use to produce a final score for their models, of which 2,583 were legitimate. AI2 also generated a baseline score using a Lucene search over the Wikipedia corpus, producing scores of 40.2% on the training set and 40.7% on the final test set. His model achieved a final score of 59.31% correct on the test question set of 2,583 questions using a combination of 15 gradient-boosting models, each with a different subset of features.

ImageNet Classification with Deep Convolutional Neural Networks

Communications of the ACM

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. Four years ago, while we were at the University of Toronto, our deep neural network called SuperVision almost halved the error rate for recognizing objects in natural images and triggered an overdue paradigm shift in computer vision. To improve their performance, we can collect larger datasets, learn more powerful models, and use better techniques for preventing overfitting. Starting in 2010, as part of the Pascal Visual Object Challenge, an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been held.

Making Chips Smarter

Communications of the ACM

In recent years, graphical processing units (GPUs) have become the technology of choice for supporting the neural networks that support AI, deep learning, and machine learning. These include improvements in GPUs as well as work on other technologies such as field programmable gate arrays (FPGAs), Tensor Processing Units (TPUs), and other chip systems and architectures that match specific AI and machine learning requirements. These initiatives, says Bryan Catanzaro, vice president of Applied Deep Learning Research at Nvidia, point in the same general direction: "The objective is to build computation platforms that deliver the performance and energy efficiency needed to build AI with a level of accuracy that isn't possible today." The Nvidia Tesla P100 chip, which packs 15 billion transistors into a silicon chip, delivers extremely high throughput on AI workloads associated with deep learning.

Attack of the Killer Microseconds

Communications of the ACM

The computer systems we use today make it easy for programmers to mitigate event latencies in the nanosecond and millisecond time scales (such as DRAM accesses at tens or hundreds of nanoseconds and disk I/Os at a few milliseconds) but significantly lack support for microsecond (μs)-scale events. For instance, when a read() system call to a disk is made, the operating system kicks off the low-level I/O operation but also performs a software context switch to a different thread to make use of the processor during the disk operation. Likewise, various mechanisms (such as interprocessor interrupts, data copies, context switches, and core hops) all add overheads, again in the microsecond range. Finally, queueing overheads--in the host, application, and network fabric--can all incur additional latencies, often on the order of tens to hundreds of microseconds.


Communications of the ACM

The Hardware/Hybrid Accelerated Cosmology Code (HACC) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. The rich structure of the current Universe--planets, stars, solar systems, galaxies, and yet larger collections of galaxies (clusters and filaments) all resulted from the growth of very small primordial fluctuations. Time-stepping criteria follow from a joint consideration of the force and mass resolution.20 Finally, stringent requirements on accuracy are imposed by the very small statistical errors in the observations--some observables must be computed at accuracies of a fraction of a percent.

Learning Securely

Communications of the ACM

A paper posted online in 2013 launched the modern wave of adversarial machine learning research by showing, for three different image processing neural networks, how to create "adversarial examples"--images that, after tiny modifications to some of the pixels, fool the neural network into classifying them differently from the way humans see them. Last year, for instance, three researchers at the University of California, Berkeley--Alex Kantchelian, Doug Tygar, and Anthony Joseph--showed a highly nonlinear machine learning model called "boosted trees" is also highly susceptible to adversarial examples. Yet even with those first examples, researchers started noticing something strange: examples designed to fool one machine learning algorithm often fooled other machine learning algorithms, too. Some researchers are making machine learning algorithms more robust by essentially "vaccinating" them: adding adversarial examples, correctly labeled, into the training data.