Goto

Collaborating Authors

 Deep Learning


Neural Network Architectures

#artificialintelligence

Deep neural networks and Deep Learning are powerful and popular algorithms. And a lot of their success lays in the careful design of the neural network architecture. I wanted to revisit the history of neural network design in the last few years and in the context of Deep Learning. It is the year 1994, and this is one of the very first convolutional neural networks, and what propelled the field of Deep Learning. This pioneering work by Yann LeCun was named LeNet5 after many previous successful iterations since they year 1988! The LeNet5 architecture was fundamental, in particular the insight that image features are distributed across the entire image, and convolutions with learnable parameters are an effective way to extract similar features at multiple location with few parameters. At the time there was no GPU to help training, and even CPUs were slow.


How to Check-Point Deep Learning Models in Keras - Machine Learning Mastery

#artificialintelligence

Deep learning models can take hours, days or even weeks to train. If the run is stopped unexpectedly, you can lose a lot of work. In this post you will discover how you can check-point your deep learning models during training in Python using the Keras library. How to Check-Point Deep Learning Models in Keras Photo by saragoldsmith, some rights reserved. Application checkpointing is a fault tolerance technique for long running processes.


Modern Deep Learning through Bayesian Eyes

#artificialintelligence

Bayesian models are rooted in Bayesian statistics, and easily benefit from the vast literature in the field. In contrast, deep learning lacks a solid mathematical grounding. Instead, empirical developments in deep learning are often justified by metaphors, evading the unexplained principles at play. These two fields are perceived as fairly antipodal to each other in their respective communities. It is perhaps astonishing then that most modern deep learning models can be cast as performing approximate inference in a Bayesian setting. The implications of this statement are profound: we can use the rich Bayesian statistics literature with deep learning models, explain away many of the curiosities with these, combine results from deep learning into Bayesian modelling, and much more.


Is there anyone on Twitter who reliable tweets out useful information on Machine Learning? โ€ข /r/MachineLearning

@machinelearnbot

They dont tweet regularly but rest assured when they do, it's pretty much state-of-the-art stuff. You can also follow Deep Learning Hub (@DeepLearningHub) but they havent tweeted in about a month.


Introducing DeepText: Facebook's Text Understanding Engine

#artificialintelligence

The team at Facebook discusses DeepText, their engine that analyses posts and comments on Facebook, to make a better product. Just imagine what the equivalent at Google does with all your phone conversations. "Text is a prevalent form of communication on Facebook. Understanding the various ways text is used on Facebook can help us improve people's experiences with our products, whether we're surfacing more of the content that people want to see or filtering out undesirable content like spam. With this goal in mind, we built DeepText, a deep learning-based text understanding engine that can understand with near-human accuracy the textual content of several thousands posts per second, spanning more than 20 languages. DeepText leverages several deep neural network architectures, including convolutional and recurrent neural nets, and can perform word-level and character-level based learning. We use FbLearner Flow and Torch for model training. Trained models are served with a click of a button through the FBLearner Predictor platform, which provides a scalable and reliable model distribution infrastructure. Facebook engineers can easily build new DeepText models through the self-serve architecture that DeepText provides."


Deep Structured Energy Based Models for Anomaly Detection

arXiv.org Machine Learning

In this paper, we attack the anomaly detection problem by directly modeling the data distribution with deep architectures. We propose deep structured energy based models (DSEBMs), where the energy function is the output of a deterministic deep neural network with structure. We develop novel model architectures to integrate EBMs with different types of data such as static data, sequential data, and spatial data, and apply appropriate model architectures to adapt to the data structure. Our training algorithm is built upon the recent development of score matching (Hyvรคrinen, 2005), which connects an EBM with a regularized autoencoder, eliminating the need for complicated sampling method. Statistically sound decision criterion can be derived for anomaly detection purpose from the perspective of the energy landscape of the data distribution. We investigate two decision criteria for performing anomaly detection: the energy score and the reconstruction error. Extensive empirical studies on benchmark tasks demonstrate that our proposed model consistently matches or outperforms all the competing methods.


Machine Learning is Fun! Part 3: Deep Learning and Convolutional Neural Networks

#artificialintelligence

You might have seen this famous xkcd comic before. The goof is based on the idea that any 3-year-old child can recognize a photo of a bird, but figuring out how to make a computer recognize objects has puzzled the very best computer scientists for over 50 years. In the last few years, we've finally found a good approach to object recognition using deep convolutional neural networks. That sounds like a a bunch of made up words from a William Gibson Sci-Fi novel, but the ideas are totally understandable if you break them down one by one. So let's do it -- let's write a program that can recognize birds!


With QuickType, Apple wants to do more than guess your next text. It wants to give you an AI.

#artificialintelligence

Your next iPhone will be even better at guessing what you want to type before you type it. Or so say the technologists at Apple. Let's say you use the word "play" in a text message. In the latest version of the iOS mobile operating system, "we can tell the difference between the Orioles who are playing in the playoffs and the children who are playing in the park, automatically," Apple senior vice president Craig Federighi said Monday morning during his keynote at the company's annual Worldwide Developer Conference. Like a lot of big tech companies, Apple is deploying deep neural networks, networks of hardware and software that can learn by analyzing vast amounts of data.


Deep Learning for Public Safety โ€“ H2O blog

#artificialintelligence

We've seen some incredible applications of Deep Learning with respect to image recognition and machine translation but this particular use case has to do with public safety; in particular, how Deep Learning can be used to fight crime in the forward-thinking cities of San Francisco and Chicago. The cool thing about these two cities (and many others!) is that they are both open data cities, which means anybody can access city data ranging from transportation information to building maintenance records. So, if you are a data scientist or thinking about becoming a data scientist, there are publicly available city-specific datasets you can play with. For this example, we looked at the historical crime data from both Chicago and San Francisco and joined this data with other external data, such as weather and socioeconomic factors, using Spark's SQL context. We do the data import, ad-hoc data munging (parsing the date column, for example), and joining of tables by leveraging the power of Spark and then publish the Spark RDD as an H2O Frame (Figure 1).


In deep learning, architecture engineering is the new feature engineering

#artificialintelligence

Two of the most important aspects of machine learning models are feature extraction and feature engineering. Those features are what supply relevant information to the machine learning models. If the features are few or irrelevant, your model may have a hard time making any useful predictions. If there are too many features, your model will be slow and likely overfit. Humans don't necessarily know what feature representation are best for a given task.