Contextual Adaptation -- Where systems construct contextual explanatory models for classes of real world phenomena. I write about these two in previous articles (see: "The Only Way to Make Deep Learning Interpretable is to have it Explain Itself" and "The Meta Model and Meta Meta Model of Deep Learning" DARPA's presentation nails it, by highlighting what's going on in current state-of-the-art research. Deep Learning systems have flaws analogous to our own intuitions having flaws. Just to recap, here's the roadmap that I have ( explained here): It's a Deep Learning roadmap and does not cover developments in other AI fields.

Today the company employs a team from a diverse range of scientific backgrounds and uses a combination of data science and machine learning techniques to manage significant amounts of client money. Anthony Ledford, Man AHL's chief scientist, emphasises the importance of diversity in all things and knows never to have too much faith in any one prediction model. If you ask me how much faith do I have in any particular model or being able to predict an individual price of a financial instrument, well I have very little faith in it. The way you can turn that into something that makes sense from an investment point of view, is to distil those tiny statistical edges down into something that, at the portfolio level, makes sense as an investment product," he said.

Batch training How to train a model using only minibatches of data at a time. Linear mixed effects models Linear modeling of fixed and random effects. Inference networks How to amortize computation for training and testing models. If you're interested in contributing a tutorial, checking out the contributing page.

Andrew Gelman: Bayesian statistics uses the mathematical rules of probability to combines data with "prior information" to give inferences which (if the model being used is correct) are more precise than would be obtained by either source of information alone. You can reproduce the classical methods using Bayesian inference: In a regression prediction context, setting the prior of a coefficient to uniform or "noninformative" is mathematically equivalent to including the corresponding predictor in a least squares or maximum likelihood estimate; setting the prior to a spike at zero is the same as excluding the predictor, and you can reproduce a pooling of predictors thorough a joint deterministic prior on their coefficients. When Bayesian methods work best, it's by providing a clear set of paths connecting data, mathematical/statistical models, and the substantive theory of the variation and comparison of interest. Bayesian methods offer a clarity that comes from the explicit specification of a so-called "generative model": a probability model of the data-collection process and a probability model of the underlying parameters.

In an earlier blog, "Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation", I introduced the need for Dynamical ML as we now enter the "Walk" stage of "Crawl-Walk-Run" evolution of machine learning. How DYNAMICAL Machine Learning is practiced is discussed in Generalized Dynamical Machine Learning and associated articles describing the theory, algorithms, examples and MATLAB code (in Systems Analytics: Adaptive Machine Learning workbook). In Machine Learning, (1) a Data Model is chosen; (2) a Learning Method is selected to obtain model parameters & (3) data are processed in a "batch" or "in-stream" (or sequential) mode. For a complete discussion of Kalman Filter use for Dynamical machine learning, see SYSTEMS Analytics book.

Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples. Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples.

We could not so far claim that deep networks trained with stochastic gradient descent are Bayesian. And it may be because SGD biases learning towards flat minima, rather than sharp minima. It turns out, (Hochreiter and Schmidhuber, 1997) motivated their work on seeking flat minima from a Bayesian, minimum description length perspective. Seeking flat minima makes sense from a minimum description length perspective.

In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Our approach generates speedups by parallelizing the time consuming hyper-parameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. We demonstrate the ability of EBO to handle sample-intensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations.

With the primary goal of building intelligent systems that automatically improve from experiences, machine learning (ML) is becoming an increasingly important field to tackle big data challenges, with an emerging field of "Big Learning," which covers theories, algorithms and systems on addressing big data problems. Co-authors Jun Zhu, Jianfei Chen, Wenbo Hu, and Bo Zhang cover the basic concepts of Bayesian methods, and review the latest progress on flexible Bayesian methods, efficient and scalable algorithms, and distributed system implementations. "Bayesian methods are becoming increasingly relevant in the Big Data era to protect high capacity models against overfitting, and to allow models adaptively updating their capacity. The scientists also discuss on the connection with deep learning, "A natural and important question that remains under addressed is how to conjoin the flexibility of deep learning and the learning efficiency of Bayesian methods for robust learning," they write.

Following their previous Insights post on ICML 2016, Two Sigma researchers Vinod Valsalam and Firdaus Janoos discuss below the notable advances in deep learning, optimization algorithms, Bayesian techniques, and time-series analysis presented at NIPS 2016. This tutorial by David Blei (Columbia), Shakir Mohamed (Deep Mind), and Rajesh Ranganath (Princeton) covered variational inference (VI) methods for approximating probability distributions through optimization. Towards the end of this tutorial, they described some of the newer advances in VI such as Monte Carlo gradient estimation, black box variational inference, stochastic approximation, and variational auto-encoders. The following are a few selected papers on deep learning, covering topics in reinforcement learning, training techniques, generative modeling, and recurrent networks.