Goto

Collaborating Authors

 Optimization


Dual Decomposition for Marginal Inference

AAAI Conferences

We present a dual decomposition approach to the tree-reweighted belief propagation objective. Each tree in the tree-reweighted bound yields one subproblem, which can be solved with the sum-product algorithm. The master problem is a simple differentiable optimization, to which a standard optimization method can be applied. Experimental results on 10x10 Ising models show the dual decomposition approach using L-BFGS is similar in settings where message-passing converges quickly, and one to two orders of magnitude faster in settings where message-passing requires many iterations, specifically high accuracy convergence, and strong interactions.


A POMDP Model of Eye-Hand Coordination

AAAI Conferences

This paper presents a generative model of eye-hand coordination. We use numerical optimization to solve for the joint behavior of an eye and two hands, deriving a predicted motion pattern from first principles, without imposing heuristics. We model the planar scene as a POMDP with 17 continuous state dimensions. Belief-space optimization is facilitated by using a nominal-belief heuristic, whereby we assume (during planning) that the maximum likelihood observation is always obtained. Since a globally-optimal solution for such a high-dimensional domain is computationally intractable, we employ local optimization in the belief domain. By solving for a locally-optimal plan through belief space, we generate a motion pattern of mutual coordination between hands and eye: the eye's saccades disambiguate the scene in a task-relevant manner, and the hands' motions anticipate the eye's saccades. Finally, the model is validated through a behavioral experiment, in which human subjects perform the same eye-hand coordination task. We show how simulation is congruent with the experimental results.


M-Unit EigenAnt: An Ant Algorithm to Find the M Best Solutions

AAAI Conferences

In this paper, we shed light on how powerful congestion control based on local interactions may be obtained. We show how ants can use repellent pheromones and incorporate the effect of crowding to avoid traffic congestion on the optimal path. Based on these interactions, we propose an ant algorithm, the M-unit EigenAnt algorithm, that leads to the selection of the M shortest paths. The ratio of selection of each of these paths is also optimal and regulated by an optimal amount of pheromone on each of them. To the best of our knowledge, the M -unit EigenAnt algorithm is the first antalgorithm that explicitly ensures the selection of the M shortest paths and regulates the amount of pheromone on them such that it is asymptotically optimal. In fact, it is in contrast with most ant algorithms that aim to discover just a single best path. We provide its convergence analysis and show that the steady state distribution of pheromone aligns with the eigenvectors of the cost matrix, and thus is related to its measure of quality. We also provide analysis to show that this property ensues even when the food is moved or path lengths change during foraging. We show that this behavior is robust in the presence of fluctuations and quickly reflects the change in the M optimal solutions. This makes it suitable for not only distributed applications butalso dynamic ones as well. Finally, we provide simulation results for the convergence to the optimal solution under different initial biases, dynamism in lengths of paths, and discovery of new paths.


Convex Sparse Coding, Subspace Learning, and Semi-Supervised Extensions

AAAI Conferences

Automated feature discovery is a fundamental problem in machine learning. Although classical feature discovery methods do not guarantee optimal solutions in general, it has been recently noted that certain subspace learning and sparse coding problems can be solved efficiently, provided the number of features is not restricted a priori. We provide an extended characterization of this optimality result and describe the nature of the solutions under an expanded set of practical contexts. In particular, we apply the framework to a semi-supervised learning problem, and demonstrate that feature discovery can co-occur with input reconstruction and supervised training while still admitting globally optimal solutions. A comparison to existing semi-supervised feature discovery methods shows improved generalization and efficiency.


Efficient Subspace Segmentation via Quadratic Programming

AAAI Conferences

We explore in this paper efficient algorithmic solutions to robustsubspace segmentation. We propose the SSQP, namely SubspaceSegmentation via Quadratic Programming, to partition data drawnfrom multiple subspaces into multiple clusters. The basic idea ofSSQP is to express each datum as the linear combination of otherdata regularized by an overall term targeting zero reconstructioncoefficients over vectors from different subspaces. The derivedcoefficient matrix by solving a quadratic programming problem istaken as an affinity matrix, upon which spectral clustering isapplied to obtain the ultimate segmentation result. Similar tosparse subspace clustering (SCC) and low-rank representation (LRR),SSQP is robust to data noises as validated by experiments on toydata. Experiments on Hopkins 155 database show that SSQP can achievecompetitive accuracy as SCC and LRR in segmenting affine subspaces,while experimental results on the Extended Yale Face Database Bdemonstrate SSQP's superiority over SCC and LRR. Beyond segmentationaccuracy, all experiments show that SSQP is much faster than bothSSC and LRR in the practice of subspace segmentation.


Learning Instance Specific Distance for Multi-Instance Classification

AAAI Conferences

Multi-Instance Learning (MIL) deals with problems where each training example is a bag, and each bag contains a set of instances. Multi-instance representation is useful in many real world applications, because it is able to capture more structural information than traditional flat single-instance representation. However, it also brings new challenges. Specifically, the distance between data objects in MIL is a set-to-set distance, which is harder to estimate than vector distances used in single-instance data. Moreover, because in MIL labels are assigned to bags instead of instances, although a bag belongs to a class, some, or even most, of its instances may not be truly related to the class. In order to address these difficulties, in this paper we propose a novel Instance Specific Distance (ISD) method for MIL, which computes the Class-to-Bag (C2B) distance by further considering the relevances of training instances with respect to their labeled classes. Taking into account the outliers caused by the weak label association in MIL, we learn ISD by solving an l0+-norm minimization problem. An efficient algorithm to solve the optimization problem is presented, together with the rigorous proof of its convergence. The promising results on five benchmark multi-instance data sets and two real world multi-instance applications validate the effectiveness of the proposed method.


Fast Newton-CG Method for Batch Learning of Conditional Random Fields

AAAI Conferences

We propose a fast batch learning method for linear-chain Conditional Random Fields (CRFs) based on Newton-CG methods. Newton-CG methods are a variant of Newton method for high-dimensional problems. They only require the Hessian-vector products instead of the full Hessian matrices. To speed up Newton-CG methods for the CRF learning, we derive a novel dynamic programming procedure for the Hessian-vector products of the CRF objective function. The proposed procedure can reuse the byproducts of the time-consuming gradient computation for the Hessian-vector products to drastically reduce the total computation time of the Newton-CG methods. In experiments with tasks in natural language processing, the proposed method outperforms a conventional quasi-Newton method. Remarkably, the proposed method is competitive with online learning algorithms that are fast but unstable.


Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification

AAAI Conferences

We concern the problem of learning a Mahalanobis distance metric for improving nearest neighbor classification. Our work is built upon the large margin nearest neighbor (LMNN) classification framework. Due to the semidefiniteness constraint in the optimization problem of LMNN, it is not scalable in terms of the dimensionality of the input data. The original LMNN solver partially alleviates this problem by adopting alternating projection methods instead of standard interior-point methods. Still, at each iteration, the computation complexity is at least O(D 3 ) (D is the dimension of input data). In this work, we propose a column generation based algorithm to solve the LMNN optimization problem much more efficiently. Our algorithm is much more scalable in tha tat each iteration, it does not need full eigen-decomposition. Instead, we only need to find the leading eigen value and its corresponding eigen vector, which is of O(D 2 ) complexity. Experiments show the efficiency and efficacy of our algorithms.


Size Adaptive Selection of Most Informative Features

AAAI Conferences

In this paper, we propose a novel method to select the most informativesubset of features, which has little redundancy andvery strong discriminating power. Our proposed approach automaticallydetermines the optimal number of features and selectsthe best subset accordingly by maximizing the averagepairwise informativeness, thus has obvious advantage overtraditional filter methods. By relaxing the essential combinatorialoptimization problem into the standard quadratic programmingproblem, the most informative feature subset canbe obtained efficiently, and a strategy to dynamically computethe redundancy between feature pairs further greatly acceleratesour method through avoiding unnecessary computationsof mutual information. As shown by the extensive experiments,the proposed method can successfully select the mostinformative subset of features, and the obtained classificationresults significantly outperform the state-of-the-art results onmost test datasets.


A Feasible Nonconvex Relaxation Approach to Feature Selection

AAAI Conferences

Variable selection problems are typically addressed under apenalized optimization framework. Nonconvex penalties such as the minimax concave plus (MCP) and smoothly clipped absolute deviation(SCAD), have been demonstrated to have the properties of sparsity practically and theoretically. In this paper we propose a new nonconvex penalty that we call exponential-type penalty. The exponential-type penalty is characterized by a positive parameter,which establishes a connection with the ell 0 and ell 1 penalties.We apply this new penalty to sparse supervised learning problems. To solve to resulting optimization problem, we resort to a reweighted ell 1 minimization method. Moreover, we devise an efficient method for the adaptive update of the tuning parameter. Our experimental results are encouraging. They show that the exponential-type penalty is competitive with MCP and SCAD.