AITopics

Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer.

algorithm, matrix, spectral function, (13 more...)

Country:

North America > United States > New York > Albany County > Albany (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.40)

Tewari, Ambuj, Bartlett, Peter L.

Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of next-state transition probabilities that are close to the estimates: a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time $T$ is within $C(P)\log T$ of the reward obtained by the optimal policy, where $C(P)$ is an explicit, MDP-dependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities and the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm. OLP is also similar in flavor to an algorithm recently proposed by Auer and Ortner. But OLP is simpler and its regret bound has a better dependence on the size of the MDP.

algorithm, denote, mdp, (16 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.85)

Globerson, Amir, Jaakkola, Tommi S.

Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to max-product but unlike max-product it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. We also describe a generalization of the method to cluster based potentials. The new method is tested on synthetic and real-world problems, and compares favorably with previous approaches.

algorithm, constraint, lp relaxation, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Oceania > Fiji (0.05)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Dani, Varsha, Kakade, Sham M., Hayes, Thomas P.

The Price of Bandit Information for Online Optimization

We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting.

algorithm, full information case, information case, (11 more...)

Country: North America > United States > Illinois > Cook County > Chicago (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Eric, Brochu, Freitas, Nando D., Ghosh, Abhijeet

Active Preference Learning with Discrete Choice Data

We propose an active learning algorithm that learns a continuous valuation model from discrete preferences. The algorithm automatically decides what items are best presented to an individual in order to find the item that they value highly in as few trials as possible, and exploits quirks of human psychology to minimize time and cognitive burden. To do this, our algorithm maximizes the expected improvement at each query without accurately modelling the entire valuation surface, which would be needlessly expensive. The problem is particularly difficult because the space of choices is infinite. We demonstrate the effectiveness of the new algorithm compared to related active learning methods. We also embed the algorithm within a decision making tool for assisting digital artists in rendering materials. The tool finds the best parameters while minimizing the number of queries.

artificial intelligence, machine learning, valuation, (16 more...)

Country:

North America > United States (0.28)
North America > Canada (0.28)

Industry: Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Rätsch, Gunnar, Warmuth, Manfred K., Glocer, Karen A.

Boosting Algorithms for Maximizing the Soft Margin

Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society Tübingen, Germany We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeledexamples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distributions onthe examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum.

artificial intelligence, inductive learning, machine learning, (19 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.24)
North America > United States > California > Santa Cruz County (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Wang, Tao, Bowling, Michael, Schuurmans, Dale, Lizotte, Daniel J.

Stable Dual Dynamic Programming

Recently, we have introduced a novel approach to dynamic programming and reinforcement learningthat is based on maintaining explicit representations of stationary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation.

approximation, machine learning, reinforcement learning, (19 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.73)

Teo, Choon H., Globerson, Amir, Roweis, Sam T., Smola, Alex J.

Convex Learning with Invariances

Incorporating invariances into a learning algorithm is a common problem in machine learning.We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying theunderlying optimization problem directly.

artificial intelligence, invariance, machine learning, (16 more...)

Country:

Oceania > Australia (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Tassa, Yuval, Erez, Tom, Smart, William D.

Receding Horizon Differential Dynamic Programming

The control of high-dimensional, continuous, non-linear systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques such as Differential Dynamic Programming (DDP) are not directly subject to the curse of dimensionality, but generate only local controllers. In this paper, we introduce Receding Horizon DDP (RH-DDP), an extension to the classic DDP algorithm, which allows us to construct stable and robust controllers based on a library of local-control trajectories. We demonstrate the effectiveness of our approach on a series of high-dimensional control problems using a simulated multi-link swimming robot. These experiments show that our approach effectively circumvents dimensionality issues, and is capable of dealing effectively with problems with (at least) 34 state and 14 action dimensions.

artificial intelligence, machine learning, trajectory, (18 more...)