Deep Learning
PyTorch -- Dynamic Batching – Illia Polosukhin – Medium
If you have been reading my blog, you may have seen that I was a TensorFlow contributor and built a lot of high-level APIs there. In Feb 2017 though, I have left Google and co-founded my own company -- NEAR.ai. Where we are teaching machines to write code from natural language. As part of this work, we are building Deep Learning models that are reading or writing code in a tree format. After trying to manage this complexity in TensorFlow, I've decided to give a try to PyTorch.
How to build a Recurrent Neural Network in TensorFlow (1/7)
In this tutorial I'll explain how to build a simple working Recurrent Neural Network in TensorFlow. This is the first in a series of seven parts where various aspects and techniques of building Recurrent Neural Networks in TensorFlow are covered. A short introduction to TensorFlow is available here. For now, let's get started with the RNN! It is short for "Recurrent Neural Network", and is basically a neural network that can be used when your data is treated as a sequence, where the particular order of the data-points matter.
Wal-Mart Will Use NVIDIA and AI to Dump AWS @themotleyfool #stocks $AMZN, $WMT, $NVDA
There was a time, not too long ago, when Wal-Mart (NYSE:WMT) was the undisputed king of retail. However, e-commerce has changed the landscape, and over the last few years, Amazon.com, Inc. (NASDAQ:AMZN) has been taking an increasingly large slice of the retail pie. Wal-Mart has been playing catch-up in online sales, but now it seems the retail giant is ready to take the fight to Amazon, with a little help from NVIDIA Corporation (NASDAQ:NVDA) and artificial intelligence (AI). In a note to clients, Global Equities Research analyst Trip Chowdhry revealed that Wal-Mart would be building huge data centers to house its cloud computing and make a sizable push into deep learning, a segment of artificial intelligence.
Generative Adversarial Networks (GANs) in 50 lines of code (PyTorch)
In 2014, Ian Goodfellow and his colleagues at the University of Montreal published a stunning paper introducing the world to GANs, or generative adversarial networks. Through an innovative combination of computational graphs and game theory they showed that, given enough modeling power, two models fighting against each other would be able to co-train through plain old backpropagation. The models play two distinct (literally, adversarial) roles. Given some real data set R, G is the generator, trying to create fake data that looks just like the genuine data, while D is the discriminator, getting data from either the real set or G and labeling the difference. Goodfellow's metaphor (and a fine one it is) was that G was like a team of forgers trying to match real paintings with their output, while D was the team of detectives trying to tell the difference.
Deep Learning with TensorFlow: Giancarlo Zaccone, Md. Rezaul Karim, Ahmed Menshawy: 9781786469786: Amazon.com: Books
Giancarlo Zaccone has more than ten years of experience in managing research projects both in scientific and industrial areas. He worked as researcher at the C.N.R, the National Research Council, where he was involved in projects relating to parallel computing and scientific visualization. Currently, he is a system and software engineer at a consulting company developing and maintaining software systems for space and defense applications. He is author of the following Packt volumes: Python Parallel Programming Cookbook and Getting Started with TensorFlow. Rezaul Karim has more than 8 years of experience in the area of research and development with a solid knowledge of algorithms and data structures, focusing C/C, Java, Scala, R, and Python and big data technologies such as Spark, Kafka, DC/OS, Docker, Mesos, Hadoop, and MapReduce.
Inferring Generative Model Structure with Static Analysis
Varma, Paroma, He, Bryan, Bajaj, Payal, Banerjee, Imon, Khandwala, Nishith, Rubin, Daniel L., Ré, Christopher
Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus reducing the data required to learn structure significantly. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations found, improving over the standard sample complexity, which is exponential in $n$ for identifying $n^{\textrm{th}}$ degree relations. Experimentally, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.
Approximating meta-heuristics with homotopic recurrent neural networks
Bay, Alessandro, Sengupta, Biswa
Much combinatorial optimisation problems constitute a non-polynomial (NP) hard optimisation problem, i.e., they can not be solved in polynomial time. One such problem is finding the shortest route between two nodes on a graph. Meta-heuristic algorithms such as $A^{*}$ along with mixed-integer programming (MIP) methods are often employed for these problems. Our work demonstrates that it is possible to approximate solutions generated by a meta-heuristic algorithm using a deep recurrent neural network. We compare different methodologies based on reinforcement learning (RL) and recurrent neural networks (RNN) to gauge their respective quality of approximation. We show the viability of recurrent neural network solutions on a graph that has over 300 nodes and argue that a sequence-to-sequence network rather than other recurrent networks has improved approximation quality. Additionally, we argue that homotopy continuation -- that increases chances of hitting an extremum -- further improves the estimate generated by a vanilla RNN.
Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario
Vie, Jill-Jênn, Yger, Florian, Lahfa, Ryan, Clement, Basile, Cocchi, Kévin, Chalumeau, Thomas, Kashima, Hisashi
Item cold-start is a classical issue in recommender systems that affects anime and manga recommendations as well. This problem can be framed as follows: how to predict whether a user will like a manga that received few ratings from the community? Content-based techniques can alleviate this issue but require extra information, that is usually expensive to gather. In this paper, we use a deep learning technique, Illustration2Vec, to easily extract tag information from the manga and anime posters (e.g., sword, or ponytail). We propose BALSE (Blended Alternate Least Squares with Explanation), a new model for collaborative filtering, that benefits from this extra information to recommend mangas. We show, using real data from an online manga recommender system called Mangaki, that our model improves substantially the quality of recommendations, especially for less-known manga, and is able to provide an interpretation of the taste of the users.
Learned Optimizers that Scale and Generalize
Wichrowska, Olga, Maheswaranathan, Niru, Hoffman, Matthew W., Colmenarejo, Sergio Gomez, Denil, Misha, de Freitas, Nando, Sohl-Dickstein, Jascha
Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on. We release an open source implementation of the meta-training algorithm.
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
Michelsanti, Daniel, Tan, Zheng-Hua
Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore the potential of conditional GANs (cGANs) for SE, and in particular, we make use of the image processing framework proposed by Isola et al. [1] to learn a mapping from the spectrogram of noisy speech to an enhanced counterpart. The SE cGAN consists of two networks, trained in an adversarial manner: a generator that tries to enhance the input noisy spectrogram, and a discriminator that tries to distinguish between enhanced spectrograms provided by the generator and clean ones from the database using the noisy spectrogram as a condition. We evaluate the performance of the cGAN method in terms of perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and equal error rate (EER) of speaker verification (an example application). Experimental results show that the cGAN method overall outperforms the classical short-time spectral amplitude minimum mean square error (STSA-MMSE) SE algorithm, and is comparable to a deep neural network-based SE approach (DNN-SE).