Optimization
A Spectral Regularization Framework for Multi-Task Structure Learning
Argyriou, Andreas, Pontil, Massimiliano, Ying, Yiming, Micchelli, Charles A.
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer.
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
Tewari, Ambuj, Bartlett, Peter L.
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of next-state transition probabilities that are close to the estimates: a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time $T$ is within $C(P)\log T$ of the reward obtained by the optimal policy, where $C(P)$ is an explicit, MDP-dependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities and the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm. OLP is also similar in flavor to an algorithm recently proposed by Auer and Ortner. But OLP is simpler and its regret bound has a better dependence on the size of the MDP.
Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations
Globerson, Amir, Jaakkola, Tommi S.
We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to max-product but unlike max-product it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. We also describe a generalization of the method to cluster based potentials. The new method is tested on synthetic and real-world problems, and compares favorably with previous approaches.
The Price of Bandit Information for Online Optimization
Dani, Varsha, Kakade, Sham M., Hayes, Thomas P.
We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting.
Active Preference Learning with Discrete Choice Data
Eric, Brochu, Freitas, Nando D., Ghosh, Abhijeet
We propose an active learning algorithm that learns a continuous valuation model from discrete preferences. The algorithm automatically decides what items are best presented to an individual in order to find the item that they value highly in as few trials as possible, and exploits quirks of human psychology to minimize time and cognitive burden. To do this, our algorithm maximizes the expected improvement at each query without accurately modelling the entire valuation surface, which would be needlessly expensive. The problem is particularly difficult because the space of choices is infinite. We demonstrate the effectiveness of the new algorithm compared to related active learning methods. We also embed the algorithm within a decision making tool for assisting digital artists in rendering materials. The tool finds the best parameters while minimizing the number of queries.
Boosting Algorithms for Maximizing the Soft Margin
Rรคtsch, Gunnar, Warmuth, Manfred K., Glocer, Karen A.
Gunnar Rรคtsch Friedrich Miescher Laboratory Max Planck Society Tรผbingen, Germany We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeledexamples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distributions onthe examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within ฮด of its maximum.
Stable Dual Dynamic Programming
Wang, Tao, Bowling, Michael, Schuurmans, Dale, Lizotte, Daniel J.
Recently, we have introduced a novel approach to dynamic programming and reinforcement learningthat is based on maintaining explicit representations of stationary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation.
Convex Learning with Invariances
Teo, Choon H., Globerson, Amir, Roweis, Sam T., Smola, Alex J.
Incorporating invariances into a learning algorithm is a common problem in machine learning.We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying theunderlying optimization problem directly.
Receding Horizon Differential Dynamic Programming
Tassa, Yuval, Erez, Tom, Smart, William D.
The control of high-dimensional, continuous, non-linear systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques such as Differential Dynamic Programming (DDP) are not directly subject to the curse of dimensionality, but generate only local controllers. In this paper, we introduce Receding Horizon DDP (RH-DDP), an extension to the classic DDP algorithm, which allows us to construct stable and robust controllers based on a library of local-control trajectories. We demonstrate the effectiveness of our approach on a series of high-dimensional control problems using a simulated multi-link swimming robot. These experiments show that our approach effectively circumvents dimensionality issues, and is capable of dealing effectively with problems with (at least) 34 state and 14 action dimensions.