Deep Learning
Google announces a powerful new AI chip and supercomputer
If artificial intelligence is rapidly eating software, then Google may have the biggest appetite around. At the company's annual developer conference today, CEO Sundar Pichai announced a new computer processor designed to perform the kind of machine learning that has taken the industry by storm in recent years (see "10 Breakthrough Technologies: Deep Learning"). The announcement reflects how rapidly artificial intelligence is transforming Google itself, and it is the surest sign yet that the company plans to lead the development of every relevant aspect of software and hardware. Perhaps most importantly, for those working in machine learning at least, the new processor not only executes at blistering speed, it can also be trained incredibly efficiently. Called the Cloud Tensor Processing Unit, the chip is named after Google's open-source TensorFlow machine-learning framework.
How Monsanto protects crops with artificial intelligence
Monsanto, the leading producer of genetically modified crops (GMOs), has announced a partnership with Atomwise. The controversial corporation will use Atomwise's artificial intelligence expertise to discover molecules that could protect crops quicker. Atomwise uses deep learning algorithms to discover molecules that might have the desired effect, rather than look at every individual molecule. The program has not been active long, but already has partners at Stanford University and UC San Diego. It takes 11 years and $250 million for the typical crop protection to come to market, according to Monsanto.
Google's AI-Building AI Is a Step Toward Self-Improving AI
Reaching the technological singularity is almost certainly going to involve AI that is able to improve itself. Google may have now taken a small step along this path by creating AI that can build AI. Speaking at the company's annual I/O developer conference, CEO Sundar Pichai announced a project called AutoML that can automate one of the hardest parts of designing deep learning software: choosing the right architecture for a neural network. The Google researchers created a machine learning system that used reinforcement learning--the trial and error approach at the heart of many of Google's most notable AI exploits--to figure out the best architectures to solve language and image recognition tasks. Not only did the results rival or beat the performance of the best human-designed architectures, but the system made some unconventional choices that researchers had previously considered inappropriate for those kinds of tasks.
New NVIDIA Pascal GPUs Accelerate Deep Learning Inference
BEIJING, CHINA--(Marketwired - Sep 12, 2016) - GPU Technology Conference China - NVIDIA (NASDAQ: NVDA) today unveiled the latest additions to its Pascal architecture-based deep learning platform, with new NVIDIA Tesla P4 and P40 GPU accelerators and new software that deliver massive leaps in efficiency and speed to accelerate inferencing production workloads for artificial intelligence services. Modern AI services such as voice-activated assistance, email spam filters, and movie and product recommendation engines are rapidly growing in complexity, requiring up to 10x more compute compared to neural networks from a year ago. Current CPU-based technology isn't capable of delivering real-time responsiveness required for modern AI services, leading to a poor user experience. The Tesla P4 and P40 are specifically designed for inferencing, which uses trained deep neural networks to recognize speech, images or text in response to queries from users and devices. Based on the Pascal architecture, these GPUs feature specialized inference instructions based on 8-bit (INT8) operations, delivering 45x faster response than CPUs1 and a 4x improvement over GPU solutions launched less than a year ago.2
Sequence Modeling via Segmentations
Wang, Chong, Wang, Yining, Huang, Po-Sen, Mohamed, Abdelrahman, Zhou, Dengyong, Deng, Li
Segmental structure is a common pattern in many types of sequences such as phrases in human languages. In this paper, we present a probabilistic model for sequences via their segmentations. The probability of a segmented sequence is calculated as the product of the probabilities of all its segments, where each segment is modeled using existing tools such as recurrent neural networks. Since the segmentation of a sequence is usually unknown in advance, we sum over all valid segmentations to obtain the final probability for the sequence. An efficient dynamic programming algorithm is developed for forward and backward computations without resorting to any approximation. We demonstrate our approach on text segmentation and speech recognition tasks. In addition to quantitative results, we also show that our approach can discover meaningful segments in their respective application contexts.
The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
Zhang, Hantian, Li, Jerry, Kara, Kaan, Alistarh, Dan, Liu, Ji, Zhang, Ce
Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We present a framework called ZipML to answer these questions. For linear models, the answer is yes. We develop a simple framework based on one simple but novel strategy called double sampling. Our framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to 6.5x faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7x in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net. Finally, we extend our framework through approximation to non-linear models, such as SVM. We show that, although using low-precision data induces bias, we can appropriately bound and control the bias. We find in practice 8-bit precision is often sufficient to converge to the correct solution. Interestingly, however, in practice we notice that our framework does not always outperform the naive rounding approach. We discuss this negative result in detail.
Prototypical Networks for Few-shot Learning
Snell, Jake, Swersky, Kevin, Zemel, Richard S.
We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to zero-shot learning and achieve state-of-the-art results on the CU-Birds dataset.
How Machine Learning Can Improve Healthcare, Medicine And Human Well-Being
Investors hope for billion-dollar health-tech "unicorns". Amid such talk it is worth remembering that the biggest winners from digital health care will be the patients who receive better treatment, and those who avoid becoming patients at all.' โ The Economist Machine learning and Artificial Intelligence (AI) continue to transform many aspects of our lives. The potential gains in healthcare are enormous. Although investment in digital healthcare start-ups has doubled since 2013, progress is slow, in part because of regulatory and cost hurdles. Machine learning in healthcare means that organisations can benefit from evolving technological capabilities.
Marimba playing robot can compose its own music
Having four arms would be an advantage for any musician, but they are just one of the many unique features of Shimon, the marimba playing robot. The machine has used its artificial intelligence and deep learning algorithms to analyse more than two million motifs, riffs and licks of music to create its own masterpiece. Aside from giving the machine the first four bars to use as a starting point, no humans are involved in either the composition or the performance of the music. Shimon (pictured) has used its artificial intelligence and deep learning algorithms to analyse over two million motifs, riffs and licks of music to create and perform its own masterpiece. Shimon is the creation of Mason Bretan, a PhD student at Georgia Tech, that uses eight sticks to play the wooden percussion instrument. He has worked with Shimon for seven years, enabling it to'listen' to music played by humans and improvise over composed chord progressions.
Lecture 17: Issues in NLP and Possible Architectures for NLP
Lecture 17 looks at solving language, efficient tree-recursive models SPINN and SNLI, as well as research highlight "Learning to compose for QA." Also covered are interlude pointer/copying models and sub-word and character-based models. This lecture series provides a thorough introduction to the cutting-edge research in deep learning applied to NLP, an approach that has recently obtained very high performance across many different NLP tasks including question answering and machine translation. It emphasizes how to implement, train, debug, visualize, and design neural network models, covering the main technologies of word vectors, feed-forward models, recurrent neural networks, recursive neural networks, convolutional neural networks, and recent models involving a memory component. For additional learning opportunities please visit: http://stanfordonline.stanford.edu/