Goto

Collaborating Authors

 Deep Learning


CartPole with a Deep Q-Network

#artificialintelligence

In my last post I developed a solution to OpenAI Gym's CartPole environment, based on a classical Q-Learning algorithm. The best score I achieved with it 120, although the score I uploaded to the leaderboard was 188. While this is certainly not a bad result, I wondered if I could do better using more advanced techniques. Besides that I also wanted to practice the concepts I had recently learned in the Machine Learning 2 course at university. By the way, to all the students among you: I found that one of the best way to learn about new algorithms etc. is to actually try to implement them in code!


3 VMware products that use AI to power their feature set

#artificialintelligence

Artificial intelligence (AI) gets paid a lot of lip service in the enterprise, with IT leaders arguing that every aspect of their stack will be revolutionized by AI. However, the real meaning of AI, and its true impact, can be difficult to measure. At a breakout session during the 2017 VMworld conference, VMware's Joel Leichnetz and Michael Gandy spoke on what AI is and how it is being used today. The pair started by defining AI, which they said is: "The theory and development of computer systems able to perform tasks that normally require human intelligence..." As noted, AI is used to mimic specific human behaviors generally. Machine learning uses algorithms to improve the ability of software to learn with experience that is provided by the operator, while deep learning is a subset of machine learning that uses algorithms allowing the software to train itself.


NVIDIA morphs from graphics and gaming to AI and deep learning 7wData

@machinelearnbot

Maybe you've heard of the x86 central processing unit (CPU) architecture that powers most PCs and servers today. But once upon a time in PC land, Intel made a bundle of cash selling x87 math co-processor chips to accompany the x86 products. These chips excelled at, and accelerated, floating point math operations and helped make PCs much faster at performing certain tasks that were hot and relevant back then, like recalculating spreadsheets. But Artificial Intelligence (AI) has, in a way, brought math co-processors back in vogue, by utilizing graphics processing units (GPUs) in a similar supporting role. As it turns out, the kind of mathematical capabilities required to render high-resolution, high frame-rate graphics are also directly applicable to AI.


Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks

arXiv.org Machine Learning

Compared with artificial neural networks (ANNs), spiking neural networks (SNNs) are promising to explore the brain-like behaviors since the spikes could encode more spatio-temporal information. Although pre-training from ANN or direct training based on backpropagation (BP) makes the supervised training of SNNs possible, these methods only exploit the networks' spatial domain information which leads to the performance bottleneck and requires many complicated training skills. Another fundamental issue is that the spike activity is naturally non-differentiable which causes great difficulties in training SNNs. To this end, we build an iterative LIF model that is more friendly for gradient descent training. By simultaneously considering the layer-by-layer spatial domain (SD) and the timing-dependent temporal domain (TD) in the training phase, as well as an approximated derivative for the spike activity, we propose a spatio-temporal backpropagation (STBP) training framework without using any complicated technology. We achieve the best performance of multi-layered perceptron (MLP) compared with existing state-of-the-art algorithms over the static MNIST and the dynamic N-MNIST dataset as well as a custom object detection dataset. This work provides a new perspective to explore the high-performance SNNs for future brain-like computing paradigm with rich spatio-temporal dynamics.


Event Representations for Automated Story Generation with Deep Neural Nets

arXiv.org Artificial Intelligence

Automated story generation is the problem of automatically selecting a sequence of events, actions, or words that can be told as a story. We seek to develop a system that can generate stories by learning everything it needs to know from textual story corpora. To date, recurrent neural networks that learn language models at character, word, or sentence levels have had little success generating coherent stories. We explore the question of event representations that provide a mid-level of abstraction between words and sentences in order to retain the semantic information of the original data while minimizing event sparsity. We present a technique for preprocessing textual story data into event sequences. We then present a technique for automated story generation whereby we decompose the problem into the generation of successive events (event2event) and the generation of natural language sentences from events (event2sentence). We give empirical results comparing different event representations and their effects on event successor generation and the translation of events to natural language.


Dual Discriminator Generative Adversarial Nets

arXiv.org Machine Learning

We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.


Co-training for Demographic Classification Using Deep Learning from Label Proportions

arXiv.org Machine Learning

Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting, in which the training data consist of bags of unlabeled instances with associated label distributions for each bag. We introduce a new regularization layer, Batch Averager, that can be appended to the last layer of any deep neural network to convert it from supervised learning to LLP. This layer can be implemented readily with existing deep learning packages. To further support domains in which the data consist of two conditionally independent feature views (e.g. image and text), we propose a co-training algorithm that iteratively generates pseudo bags and refits the deep LLP model to improve classification accuracy. We demonstrate our models on demographic attribute classification (gender and race/ethnicity), which has many applications in social media analysis, public health, and marketing. We conduct experiments to predict demographics of Twitter users based on their tweets and profile image, without requiring any user-level annotations for training. We find that the deep LLP approach outperforms baselines for both text and image features separately. Additionally, we find that co-training algorithm improves image and text classification by 4% and 8% absolute F1, respectively. Finally, an ensemble of text and image classifiers further improves the absolute F1 measure by 4% on average.


Applying Deep Learning to natural language processing

@machinelearnbot

Language is the medium that humans use for conversing. Giving machines the ability to learn human language with natural language processing has given rise to several new products and possibilities that were not previously imaginable. Natural language processing (NLP) is one of the most important technologies present in the information age. Understanding complex language utterances is a crucial part of artificial intelligence. Applications of NLP can be found across several industry domains such as web search, advertisement, emails, customer service, language translation, radiology reports, etc. Natural language processing techniques are designed in the same manner that the human brain learns language processing.



Neural Language Modeling From Scratch (Part 1)

#artificialintelligence

The decoder is a simple function that takes a representation of the input word and returns a distribution which represents the model's predictions for the next word: the model assigns to each word the probability that it will be the next word in the sequence. This model is similar to the simple one, just that after encoding the current input word we feed the resulting representation (of size 200) into a two layer LSTM, which then outputs a vector also of size 200 (at every time step the LSTM also receives a vector representing its previous state- this is not shown in the diagram). In the input embedding, words that have similar meanings are represented by similar vectors (similar in terms of cosine similarity). Because the model would like to, given the RNN output, assign similar probability values to similar words, similar words are represented by similar vectors.