Optimization
16 Experiments with the Adaptive Graph Traverser Donald Michie and Robert Ross
A formal description is given of GT 4, a revised and extended version of the Graph Traverser. Methods are described whereby GT4 can improve its performance at run time (a) by automatic optimization of parameters used by the evaluation function and (b) by dynamic re-ordering of operators. Neither method depends upon there being any successful searches in the program's past experience of a given problem. The essential feasibility of both approaches has been validated in experimental tests using sliding block puzzles. Two planned extensions, 'local smoothing' and'regionalization' are described. INTRODUCTION The Graph Traverser (Doran and Michie 1966), and subsequent work based on it, represents an attempt to adapt game-playing methods, particularly those of Samuel (1959), to automatic problem-solving. The design objective is not the simulation of human problem-solving as a study in psychology, but rather to provide an efficient general-purpose search procedure appropriate to non-numerical problem domains. There is a parallel with the development of direct search techniques for numerical function minimization, for example pattern search (Hooke and Jeeves 1961), simplex (Spendley, Hext and Himsworth 1962, Nelder and Mead 1965).
Neural Networks, Adaptive Optimization, and RNA Secondary Structure Prediction
The RNA secondary structure prediction problem (2 RNA) is a critical one in molecular biology. Secondary structure can be determined directly by x-ray diffraction, but this is difficult, slow, and expensive. Moreover, it is currently impossible to crystallize most RNAs. Mathematical models for prediction have therefore been developed and these have led to serial (and some parallel) computer algorithms, but these too are expensive in terms of computation time. The general solution has asymptotic running time exponential in N (i.e., proportional to 2 N), where N is the length of the RNA sequence. Serial approximation algorithms which employ heuristics and make strong assumptions are significantly faster, on the order of N 3 or N 4, but their predictive success rates are low -- often less than 40 percent -- and even these algorithms can run for days when processing very long (thousands of bases) RNA sequences. Neural network algorithms that perform a multiple constraint satisfaction search using a massively parallel network of simple processors may provide accurate and very fast solutions.
Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators
Zhong, Kai, Yen, Ian E. H., Dhillon, Inderjit S., Ravikumar, Pradeep
We consider the class of optimization problems arising from computationally intensive L1-regularized M-estimators, where the function or gradient values are very expensive to compute. A particular instance of interest is the L1-regularized MLE for learning Conditional Random Fields (CRFs), which are a popular class of statistical models for varied structured prediction problems such as sequence labeling, alignment, and classification with label taxonomy. L1-regularized MLEs for CRFs are particularly expensive to optimize since computing the gradient values requires an expensive inference step. In this work, we propose the use of a carefully constructed proximal quasi-Newton algorithm for such computationally intensive M-estimation problems, where we employ an aggressive active set selection technique. In a key contribution of the paper, we show that the proximal quasi-Newton method is provably super-linearly convergent, even in the absence of strong convexity, by leveraging a restricted variant of strong convexity. In our experiments, the proposed algorithm converges considerably faster than current state-of-the-art on the problems of sequence labeling and hierarchical classification.
Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models
Nonnegative matrix factorization (NMF) is a powerful class of feature extraction techniques that has been successfully applied in many fields, namely in signal and image processing. Current NMF techniques have been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. In this paper, we propose to revisit the NMF as a multi-objective problem, in particular a bi-objective one, where the objective functions defined in both input and feature spaces are taken into account. By taking the advantage of the sum-weighted method from the literature of multi-objective optimization, the proposed bi-objective NMF determines a set of nondominated, Pareto optimal, solutions instead of a single optimal decomposition. Moreover, the corresponding Pareto front is studied and approximated. Experimental results on unmixing real hyperspectral images confirm the efficiency of the proposed bi-objective NMF compared with the state-of-the-art methods.
Noisy Sparse Subspace Clustering
This paper considers the problem of subspace clustering under noise. Specifically, we study the behavior of Sparse Subspace Clustering (SSC) when either adversarial or random noise is added to the unlabelled input data points, which are assumed to be in a union of low-dimensional subspaces. We show that a modified version of SSC is \emph{provably effective} in correctly identifying the underlying subspaces, even with noisy data. This extends theoretical guarantee of this algorithm to more practical settings and provides justification to the success of SSC in a class of real applications.
Submodular relaxation for inference in Markov random fields
The problem of inference in a Markov random field (MRF) arises in many applied domains, e.g. in machine learning, computer vision, natural language processing, etc. In this paper we focus on one important type of inference: maximum a posteriori (MAP) inference, often referred to as MRF energy minimization. Inference of this type is a combinatorial optimization problem, i.e. an optimization problem with the finite domain. The most studied case of MRF energy minimization is the situation when the energy can be represented as a sum of terms (potentials) that depend on only one or two variables each (unary and pairwise potentials). In this setting the energy is said to be defined by a graph where the nodes correspond to the variables and the edges to the pairwise potentials. Minimization of energies defined on graphs in known to be NPhard in general [8] but can be done exactly in polynomial time in a number of special cases, e.g. if the graph defining the energy is acyclic [36] or if the energy is submodular in standard [28] or multi-label sense [10]. One way to go beyond pairwise potentials is to add higher-order summands to the energy. For example, Kohli et al. [23] and Ladický et al. [32] use high-order potentials based on superpixels (image regions) for semantic image segmentation; Delong et al. [11] use label cost potentials for geometric model fitting tasks. To be tractable, high-order potentials need to have a compact representation.