coordinate ascent
Reviews: Asynchronous Parallel Coordinate Minimization for MAP Inference
Summary: This paper proposes an asynchronous parallel approximate algorithm for MAP inference in graphical models represented as factor graphs. The proposed method is based on dual decomposition which breaks the model into overlapping pieces and adds penalty terms to enforce agreement between the overlapping portions. Whereas, HOGWILD performs asynchronous gradient updates at each factor, the proposed method performs full coordinate ascent at each iteration. The main concern is that updates based on stale values will be invalid, however, the authors show results that bound expected errors of this type. The authors also provide some methods for adaptively choosing the number of worker nodes to further minimize this error.
Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming
Multi-model Markov decision process (MMDP) is a promising framework for computing policies that are robust to parameter uncertainty in MDPs. MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models. Because MMDPs are NP-hard to solve, most methods resort to approximations. In this paper, we derive the policy gradient of MMDPs and propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs. The main innovation of CADP compared with earlier algorithms is to take the coordinate ascent perspective to adjust model weights iteratively to guarantee monotone policy improvements to a local maximum. A theoretical analysis of CADP proves that it never performs worse than previous dynamic programming algorithms like WSU. Our numerical results indicate that CADP substantially outperforms existing methods on several benchmark problems.
- North America > United States > New Hampshire (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Feature selection with optimal coordinate ascent (OCA)
Saltiel, David, Benhamou, Eric
In machine learning, Feature Selection (FS) is a major part of efficient algorithm. It fuels the algorithm and is the starting block for our prediction. In this paper, we present a new method, called Optimal Coordinate Ascent (OCA) that allows us selecting features among block and individual features. OCA relies on coordinate ascent to find an optimal solution for gradient boosting methods score (number of correctly classified samples). OCA takes into account the notion of dependencies between variables forming blocks in our optimization. The coordinate ascent optimization solves the issue of the NP hard original problem where the number of combinations rapidly explode making a grid search unfeasible. It reduces considerably the number of iterations changing this NP hard problem into a polynomial search one. OCA brings substantial differences and improvements compared to previous coordinate ascent feature selection method: we group variables into block and individual variables instead of a binary selection. Our initial guess is based on the k-best group variables making our initial point more robust. We also introduced new stopping criteria making our optimization faster. We compare these two methods on our data set. We found that our method outperforms the initial one. We also compare our method to the Recursive Feature Elimination (RFE) method and find that OCA leads to the minimum feature set with the highest score. This is a nice byproduct of our method as it provides empirically the most compact data set with optimal performance.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Analysis of boosting algorithms using the smooth margin function
Rudin, Cynthia, Schapire, Robert E., Daubechies, Ingrid
We introduce a useful tool for analyzing boosting algorithms called the ``smooth margin function,'' a differentiable approximation of the usual margin for boosting algorithms. We present two boosting algorithms based on this smooth margin, ``coordinate ascent boosting'' and ``approximate coordinate ascent boosting,'' which are similar to Freund and Schapire's AdaBoost algorithm and Breiman's arc-gv algorithm. We give convergence rates to the maximum margin solution for both of our algorithms and for arc-gv. We then study AdaBoost's convergence properties using the smooth margin function. We precisely bound the margin attained by AdaBoost when the edges of the weak classifiers fall within a specified range. This shows that a previous bound proved by R\"{a}tsch and Warmuth is exactly tight. Furthermore, we use the smooth margin to capture explicit properties of AdaBoost in cases where cyclic behavior occurs.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (4 more...)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.67)
On the Dynamics of Boosting
Rudin, Cynthia, Daubechies, Ingrid, Schapire, Robert E.
In order to understand AdaBoost's dynamics, especially its ability to maximize margins, we derive an associated simplified nonlinear iterated map and analyze its behavior in low-dimensional cases. We find stable cycles for these cases, which can explicitly be used to solve for Ada-Boost's output. By considering AdaBoost as a dynamical system, we are able to prove Rätsch and Warmuth's conjecture that AdaBoost may fail to converge to a maximal-margin combined classifier when given a'nonoptimal' weak learning algorithm.
On the Dynamics of Boosting
Rudin, Cynthia, Daubechies, Ingrid, Schapire, Robert E.
In order to understand AdaBoost's dynamics, especially its ability to maximize margins, we derive an associated simplified nonlinear iterated map and analyze its behavior in low-dimensional cases. We find stable cycles for these cases, which can explicitly be used to solve for Ada-Boost's output. By considering AdaBoost as a dynamical system, we are able to prove Rätsch and Warmuth's conjecture that AdaBoost may fail to converge to a maximal-margin combined classifier when given a'nonoptimal' weak learning algorithm.
On the Dynamics of Boosting
Rudin, Cynthia, Daubechies, Ingrid, Schapire, Robert E.
In order to understand AdaBoost's dynamics, especially its ability to maximize margins, we derive an associated simplified nonlinear iterated map and analyze its behavior in low-dimensional cases. We find stable cycles for these cases, which can explicitly be used to solve for Ada-Boost's output. By considering AdaBoost as a dynamical system, we are able to prove Rätsch and Warmuth's conjecture that AdaBoost may fail to converge to a maximal-margin combined classifier when given a'nonoptimal' weaklearning algorithm.