Optimization
A General Greedy Approximation Algorithm with Applications
Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.
The Concave-Convex Procedure (CCCP)
Yuille, Alan L., Rangarajan, Anand
This paper describes a simple geometrical Concave-Convex procedure (CCCP) for constructing discrete time dynamical systems which can be guaranteed to decrease almost any global optimization/energy function (see technical conditions in section (2)). We prove that there is a relationship between CCCP and optimization techniques based on introducing auxiliary variables using Legendre transforms. We distinguish between Legendre min-max and Legendre minimization. In the former, see [6], the introduction of auxiliary variables converts the problem to a min-max problem where the goal is to find a saddle point. By contrast, in Legendre minimization, see [8], the problem remains a minimization one (and so it becomes easier to analyze convergence).
Probabilistic Abstraction Hierarchies
Segal, Eran, Koller, Daphne, Ormoneit, Dirk
Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in "nearby" classes in the taxonomy are similar. In this paper, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a probabilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchical agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data.
Matching Free Trees with Replicator Equations
Motivated by our recent work on rooted tree matching, in this paper we provide a solution to the problem of matching two free (i.e., unrooted) trees by constructing an association graph whose maximal cliques are in one-to-one correspondence with maximal common subtrees. We then solve the problem using simple replicator dynamics from evolutionary game theory. Experiments on hundreds of uniformly random trees are presented. The results are impressive: despite the inherent inability of these simple dynamics to escape from local optima, they always returned a globally optimal solution.
Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM
We propose a framework based on a parametric quadratic programming (QP) technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, while the second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing technique which obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVM training problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Sampling techniques may further boost convergence.
Approximate Dynamic Programming via Linear Programming
Farias, Daniela, Roy, Benjamin V.
The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to- go function. We develop bounds on the approximation error and present experimental results in the domain of queueing network control, providing empirical support for the methodology.
On the Convergence of Leveraging
Rรคtsch, Gunnar, Mika, Sebastian, Warmuth, Manfred K. K.
We give an unified convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the Least-Square- Boost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the Gauss-Southwell method known from numerical optimization and state non-asymptotical convergence results for all these methods. Our analysis includes -norm regularized cost functions leading to a clean and general way to regularize ensemble learning.
Constructing Distributed Representations Using Additive Clustering
If the promise of computational modeling is to be fully realized in higherlevel cognitive domains such as language processing, principled methods must be developed to construct the semantic representations used in such models. In this paper, we propose the use of an established formalism from mathematical psychology, additive clustering, as a means of automatically constructing binary representations for objects using only pairwise similarity data. However, existing methods for the unsupervised learning of additive clustering models do not scale well to large problems. We present a new algorithm for additive clustering, based on a novel heuristic technique for combinatorial optimization. The algorithm is simpler than previous formulations and makes fewer independence assumptions. Extensive empirical tests on both human and synthetic data suggest that it is more effective than previous methods and that it also scales better to larger problems. By making additive clustering practical, we take a significant step toward scaling connectionist models beyond hand-coded examples.
Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM
We propose a framework based on a parametric quadratic programming (QP)technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, whilethe second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing techniquewhich obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVMtraining problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Samplingtechniques may further boost convergence.
Direct value-approximation for factored MDPs
Schuurmans, Dale, Patrascu, Relu
We present a simple approach for computing reasonable policies for factored Markov decision processes (MDPs), when the optimal valuefunction can be approximated by a compact linear form. Our method is based on solving a single linear program that approximates thebest linear fit to the optimal value function. By applying an efficient constraint generation procedure we obtain an iterative solution method that tackles concise linear programs. This direct linear programming approach experimentally yields a significant reductionin computation time over approximate value-and policy-iteration methods (sometimes reducing several hours to a few seconds). However, the quality of the solutions produced by linear programming is weaker-usually about twice the approximation errorfor the same approximating class. Nevertheless, the speed advantage allows one to use larger approximation classes to achieve similar error in reasonable time. 1 Introduction Markov decision processes (MDPs) form a foundation for control in uncertain and stochastic environments and reinforcement learning. Standard methods such as value-iteration, policy-iteration and linear programming can be used to produce optimal control policies for MDPs that are expressed in explicit form; that is, the policy, value function and state transition model are all represented in a tabular manner that explicitly enumerates the state space.