Deep Learning


Neural Language Modeling From Scratch (Part 1)

#artificialintelligence

The decoder is a simple function that takes a representation of the input word and returns a distribution which represents the model's predictions for the next word: the model assigns to each word the probability that it will be the next word in the sequence. This model is similar to the simple one, just that after encoding the current input word we feed the resulting representation (of size 200) into a two layer LSTM, which then outputs a vector also of size 200 (at every time step the LSTM also receives a vector representing its previous state- this is not shown in the diagram). In the input embedding, words that have similar meanings are represented by similar vectors (similar in terms of cosine similarity). Because the model would like to, given the RNN output, assign similar probability values to similar words, similar words are represented by similar vectors.


How AI Protects PayPal's Payments and Performance The Official NVIDIA Blog

#artificialintelligence

With advances in machine learning and the deployments of neural networks, logistic regression-powered models are expanding their uses throughout PayPal. PayPal's deep learning system is able to filter out deceptive merchants and crack down on sales of illegal products. Kutsyy explained the machines can identify "why transactions fail, monitoring businesses more efficiently," avoiding the need to buy more hardware for problem solving. The AI Podcast is available through iTunes, DoggCatcher, Google Play Music, Overcast, PlayerFM, Podbay, Pocket Casts, PodCruncher, PodKicker, Stitcher and Soundcloud.



You can probably use deep learning even if your data isn't that big

@machinelearnbot

Deep learning models are complex and tricky to train, and I had a hunch that lack of model convergence/difficulties training probably explained the poor performance, not overfitting. We recreated python versions of the Leekasso and MLP used in the original post to the best of our ability, and the code is available here. The MLP used in the original analysis still looks pretty bad for small sample sizes, but our neural nets get essentially perfect accuracy for all sample sizes. A lot of parameters are problem specific (especially the parameters related to SGD) and poor choices will result in misleadingly bad performance.


Moore's Law may be out of steam, but the power of artificial intelligence is accelerating

#artificialintelligence

A paper from Google's researchers says they simultaneously used as many as 800 of the powerful and expensive graphics processors that have been crucial to the recent uptick in the power of machine learning (see "10 Breakthrough Technologies 2013: Deep Learning"). Feeding data into deep learning software to train it for a particular task is much more resource intensive than running the system afterwards, but that still takes significant oomph. Intel has slowed the pace at which it introduces generations of new chips with smaller, denser transistors (see "Moore's Law Is Dead. It also motivates the startups--and giants such as Google--creating new chips customized to power machine learning (see "Google Reveals a Powerful New AI Chip and Supercomputer").


Moore's Law may be out of steam, but the power of artificial intelligence is accelerating

#artificialintelligence

A paper from Google's researchers says they simultaneously used as many as 800 of the powerful and expensive graphics processors that have been crucial to the recent uptick in the power of machine learning (see "10 Breakthrough Technologies 2013: Deep Learning"). Feeding data into deep learning software to train it for a particular task is much more resource intensive than running the system afterwards, but that still takes significant oomph. Intel has slowed the pace at which it introduces generations of new chips with smaller, denser transistors (see "Moore's Law Is Dead. It also motivates the startups--and giants such as Google--creating new chips customized to power machine learning (see "Google Reveals a Powerful New AI Chip and Supercomputer").


Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow

#artificialintelligence

A positive label means that an utterance was an actual response to a context, and a negative label means that the utterance wasn't – it was picked randomly from somewhere in the corpus. Each record in the test/validation set consists of a context, a ground truth utterance (the real response) and 9 incorrect utterances called distractors. Before starting with fancy Neural Network models let's build some simple baseline models to help us understand what kind of performance we can expect. The Deep Learning model we will build in this post is called a Dual Encoder LSTM network.