AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning

Gorban, A. N., Mirkes, E. M., Zinovyev, A.

arXiv.org Machine LearningAug-21-2016

Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on $L_1$ norm or even sub-linear potentials corresponding to quasinorms $L_p$ ($0

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neunet.2016.08.007

1605.06276

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Iterative Views Agreement: An Iterative Low-Rank based Structured Optimization Method to Multi-View Spectral Clustering

Wang, Yang, Zhang, Wenjie, Wu, Lin, Lin, Xuemin, Fang, Meng, Pan, Shirui

arXiv.org Machine LearningAug-19-2016

Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem. Among the existing methods, Low-Rank Representation (LRR) based method is quite superior in terms of its effectiveness, intuitiveness and robustness to noise corruptions. However, it aggressively tries to learn a common low-dimensional subspace for multi-view data, while inattentively ignoring the local manifold structure in each view, which is critically important to the spectral clustering; worse still, the low-rank minimization is enforced to achieve the data correlation consensus among all views, failing to flexibly preserve the local manifold structure for each view. In this paper, 1) we propose a multi-graph laplacian regularized LRR with each graph laplacian corresponding to one view to characterize its local manifold structure. 2) Instead of directly enforcing the low-rank minimization among all views for correlation consensus, we separately impose low-rank constraint on each view, coupled with a mutual structural consensus constraint, where it is able to not only well preserve the local manifold structure but also serve as a constraint for that from other views, which iteratively makes the views more agreeable. Extensive experiments on real-world multi-view data sets demonstrate its superiority.

artificial intelligence, machine learning, spectral, (19 more...)

arXiv.org Machine Learning

1608.0556

Country: Oceania > Australia (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

A Tight Convex Upper Bound on the Likelihood of a Finite Mixture

Mezuman, Elad, Weiss, Yair

arXiv.org Machine LearningAug-18-2016

The likelihood function of a finite mixture model is a non-convex function with multiple local maxima and commonly used iterative algorithms such as EM will converge to different solutions depending on initial conditions. In this paper we ask: is it possible to assess how far we are from the global maximum of the likelihood? Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. For any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihood

artificial intelligence, likelihood, machine learning, (18 more...)

arXiv.org Machine Learning

1608.05275

Country: Asia > Middle East (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Local Network Community Detection with Continuous Optimization of Conductance and Weighted Kernel K-Means

van Laarhoven, Twan, Marchiori, Elena

arXiv.org Machine LearningAug-17-2016

Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of conductance. We show that continuous optimization of this objective still leads to discrete communities. We investigate the relation of conductance with weighted kernel k-means for a single community, which leads to the introduction of a new objective function, $\sigma$-conductance. Conductance is obtained by setting $\sigma$ to $0$. Two algorithms, EMc and PGDc, are proposed to locally optimize $\sigma$-conductance and automatically tune the parameter $\sigma$. They are based on expectation maximization and projected gradient descent, respectively. We prove locality and give performance guarantees for EMc and PGDc for a class of dense and well separated communities centered around the seeds. Experiments are conducted on networks with ground-truth communities, comparing to state-of-the-art graph diffusion algorithms for conductance optimization. On large graphs, results indicate that EMc and PGDc stay localized and produce communities most similar to the ground, while graph diffusion algorithms generate large communities of lower quality.

conductance, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

1601.05775

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.67)
Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Lecture Notes on Spectral Graph Methods

Mahoney, Michael W.

arXiv.org Machine LearningAug-16-2016

These are lecture notes that are based on the lectures from a class I taught on the topic of Spectral Graph Methods at UC Berkeley during the Spring 2015 semester.

optimization problem, quality-of-approximation guarantee, survey article, (22 more...)

arXiv.org Machine Learning

1608.04845

Country: North America > United States > California (0.13)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Energy > Oil & Gas (0.92)
Education (0.87)
Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Add feedback

Fast Calculation of the Knowledge Gradient for Optimization of Deterministic Engineering Simulations

van der Herten, Joachim, Couckuyt, Ivo, Deschrijver, Dirk, Dhaene, Tom

arXiv.org Machine LearningAug-16-2016

A novel efficient method for computing the Knowledge-Gradient policy for Continuous Parameters (KGCP) for deterministic optimization is derived. The differences with Expected Improvement (EI), a popular choice for Bayesian optimization of deterministic engineering simulations, are explored. Both policies and the Upper Confidence Bound (UCB) policy are compared on a number of benchmark functions including a problem from structural dynamics. It is empirically shown that KGCP has similar performance as the EI policy for many problems, but has better convergence properties for complex (multi-modal) optimization problems as it emphasizes more on exploration when the model is confident about the shape of optimal regions. In addition, the relationship between Maximum Likelihood Estimation (MLE) and slice sampling for estimation of the hyperparameters of the underlying models, and the complexity of the problem at hand, is studied.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Machine Learning

1608.0455

Country:

North America > United States (0.28)
Europe > Belgium (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)

Add feedback

On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

Lafond, Jean, Wai, Hoi-To, Moulines, Eric

arXiv.org Machine LearningAug-15-2016

In this paper, the online variants of the classical Frank-Wolfe algorithm are considered. We consider minimizing the regret with a stochastic cost. The online algorithms only require simple iterative updates and a non-adaptive step size rule, in contrast to the hybrid schemes commonly considered in the literature. Several new results are derived for convex and non-convex losses. With a strongly convex stochastic cost and when the optimal solution lies in the interior of the constraint set or the constraint set is a polytope, the regret bound and anytime optimality are shown to be ${\cal O}( \log^3 T / T )$ and ${\cal O}( \log^2 T / T)$, respectively, where $T$ is the number of rounds played. These results are based on an improved analysis on the stochastic Frank-Wolfe algorithms. Moreover, the online algorithms are shown to converge even when the loss is non-convex, i.e., the algorithms find a stationary point to the time-varying/stochastic loss at a rate of ${\cal O}(\sqrt{1/T})$. Numerical experiments on realistic data sets are presented to support our theoretical claims.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1510.01171

Country: Europe (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering

Fu, Xiao, Huang, Kejun, Yang, Bo, Ma, Wing-Kin, Sidiropoulos, Nicholas D.

arXiv.org Machine LearningAug-15-2016

This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2016.2602800

1608.0429

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Coordinate Friendly Structures, Algorithms and Applications

Peng, Zhimin, Wu, Tianyu, Xu, Yangyang, Yan, Ming, Yin, Wotao

arXiv.org Machine LearningAug-14-2016

This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize. The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates. Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.4310/AMSA.2016.v1.n1.a2

1601.00863

Country:

North America > United States > Minnesota (0.27)
North America > United States > Michigan (0.27)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Mini-Batch Spectral Clustering

Han, Yufei, Filippone, Maurizio

arXiv.org Machine LearningAug-12-2016

The cost of computing the spectrum of Laplacian matrices hinders the application of spectral clustering to large data sets. While approximations recover computational tractability, they can potentially affect clustering performance. This paper proposes a practical approach to learn spectral clustering based on adaptive stochastic gradient optimization. Crucially, the proposed approach recovers the exact spectrum of Laplacian matrices in the limit of the iterations, and the cost of each iteration is linear in the number of samples. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.

artificial intelligence, machine learning, spectral, (15 more...)

arXiv.org Machine Learning

1607.02024

Country: North America > United States (0.94)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback