Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem

#artificialintelligence

Last night on the train I read this nice paper by David Duvenaud and colleagues. So I thought it's time for a David Duvenaud birthday special (don't get too excited David, I won't make it an annual tradition...) I recently covered iMAML: the meta-learning algorithm that makes use of implicit gradients to sidestep backpropagating through the inner loop optimization in meta-learning/hyperparameter tuning. The method presented in (Lorraine et al, 2019) uses the same high-level idea, but introduces a different - on the surface less fiddly - approximation to the crucial inverse Hessian. I won't spend a lot of time introducing the whole meta-learning setup from scratch, you can use the previous post as a starting point. Many - though not all - meta-learning or hyperparameter optimization problems can be stated as nested optimization problems.



CS 229 - Supervised Learning Cheatsheet

#artificialintelligence

Given a set of data points $\{x {(1)}, ..., x {(m)}\}$ associated to a set of outcomes $\{y {(1)}, ..., y {(m)}\}$, we want to build a classifier that learns how to predict $y$ from $x$. Hypothesis โ€• The hypothesis is noted $h_\theta$ and is the model that we choose. Loss function โ€• A loss function is a function $L:(z,y)\in\mathbb{R}\times Y\longmapsto L(z,y)\in\mathbb{R}$ that takes as inputs the predicted value $z$ corresponding to the real data value $y$ and outputs how different they are. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples. Likelihood โ€• The likelihood of a model $L(\theta)$ given parameters $\theta$ is used to find the optimal parameters $\theta$ through maximizing the likelihood.


AI Notes: Parameter optimization in neural networks - deeplearning.ai

#artificialintelligence

In machine learning, you start by defining a task and a model. The model consists of an architecture and parameters. For a given architecture, the values of the parameters determine how accurately the model performs the task. But how do you find good values? By defining a loss function that evaluates how well the model performs.


Regret Circuits: Composability of Regret Minimizers

#artificialintelligence

Automated decision-making is one of the core objectives of artificial intelligence. Not surprisingly, over the past few years, entire new research fields have emerged to tackle that task. This blog post is concerned with regret minimization, one of the central tools in online learning. Regret minimization models the problem of repeated online decision making: an agent is called to make a sequence of decisions, under unknown (and potentially adversarial) loss functions. Regret minimization is a versatile mathematical abstraction, that has found a plethora of practical applications: portfolio optimization, computation of Nash equilibria, applications to markets and auctions, submodular function optimization, and more.