Collaborating Authors

Exact natural gradient in deep linear networks and its application to the nonlinear case

Neural Information Processing Systems

Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions. In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. Here, we derive an exact expression for the natural gradient in deep linear networks, which exhibit pathological curvature similar to the nonlinear case. We provide for the first time an analytical solution for its convergence rate, showing that the loss decreases exponentially to the global minimum in parameter space. Our expression for the natural gradient is surprisingly simple, computationally tractable, and explains why some approximations proposed previously work well in practice.

In a Boltzmann machine, why isn't there a simple expression for the optimal edge weights in terms of correlations between variables?


If we do this by using gradient ascent on the log-likelihood function, each step of gradient ascent involves an expensive expectation estimate using MCMC (or some cheaper approximation). Conceptually the edge weights represent the "interaction strength" between variables, i.e. $w_{ij}$ represents how much $x_i$ and $x_j$ "want" to be equal. It would make sense that variables that are highly positively correlated have large positive edge weights, and variables that are negatively correlated have negative edge weights. But this would imply that learning the edge weights is easy, because we could just calculate the correlations, apply some mapping and get the edge weights. Obviously that is not true or we wouldn't need the expensive algorithm.

New Ways for Optimizing Gradient Descent


The new era of machine learning and artificial intelligence is the Deep learning era. It not only has immeasurable accuracy but also a huge hunger for data. Employing neural nets, functions with more exceeding complexity can be mapped on given data points. But there are a few very precise things which make the experience with neural networks more incredible and perceiving. Let us assume that we have trained a huge neural network.

Model-Based Stabilisation of Deep Reinforcement Learning Machine Learning

Though successful in high-dimensional domains, deep reinforcement learning exhibits high sample complexity and suffers from stability issues as reported by researchers and practitioners in the field. These problems hinder the application of such algorithms in real-world and safety-critical scenarios. In this paper, we take steps towards stable and efficient reinforcement learning by following a model-based approach that is known to reduce agent-environment interactions. Namely, our method augments deep Q-networks (DQNs) with model predictions for transitions, rewards, and termination flags. Having the model at hand, we then conduct a rigorous theoretical study of our algorithm and show, for the first time, convergence to a stationary point. En route, we provide a counter-example showing that 'vanilla' DQNs can diverge confirming practitioners' and researchers' experiences. Our proof is novel in its own right and can be extended to other forms of deep reinforcement learning. In particular, we believe exploiting the relation between reinforcement (with deep function approximators) and online learning can serve as a recipe for future proofs in the domain. Finally, we validate our theoretical results in 20 games from the Atari benchmark. Our results show that following the proposed model-based learning approach not only ensures convergence but leads to a reduction in sample complexity and superior performance.

WhizzML: Level Up


Sure, you can use WhizzML to fill in missing values or to do some basic data cleaning, but what if you want to go crazy? WhizzML is a fully-fledged programming language, after all. We can go as far down the rabbit hole as we want. As we've mentioned before, one of the great things about writing programs in WhizzML is access to highly-scalable, library-free machine learning. To put in another way, cloud-based machine learning operations (learn an ensemble, create a dataset, etc.) are primitives built into the language.