AITopics

1412.726

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Lu, Zhaosong, Chen, Xiaojun

Generalized Conjugate Gradient Methods for $\ell_1$ Regularized Convex Quadratic Programming with Finite Convergence

The conjugate gradient (CG) method is an efficient iterative method for solving large-scale strongly convex quadratic programming (QP). In this paper we propose some generalized CG (GCG) methods for solving the $\ell_1$-regularized (possibly not strongly) convex QP that terminate at an optimal solution in a finite number of iterations. At each iteration, our methods first identify a face of an orthant and then either perform an exact line search along the direction of the negative projected minimum-norm subgradient of the objective function or execute a CG subroutine that conducts a sequence of CG iterations until a CG iterate crosses the boundary of this face or an approximate minimizer of over this face or a subface is found. We determine which type of step should be taken by comparing the magnitude of some components of the minimum-norm subgradient of the objective function to that of its rest components. Our analysis on finite convergence of these methods makes use of an error bound result and some key properties of the aforementioned exact line search and the CG subroutine. We also show that the proposed methods are capable of finding an approximate solution of the problem by allowing some inexactness on the execution of the CG subroutine. The overall arithmetic operation cost of our GCG methods for finding an $\epsilon$-optimal solution depends on $\epsilon$ in $O(\log(1/\epsilon))$, which is superior to the accelerated proximal gradient method [2,23] that depends on $\epsilon$ in $O(1/\sqrt{\epsilon})$. In addition, our GCG methods can be extended straightforwardly to solve box-constrained convex QP with finite convergence. Numerical results demonstrate that our methods are very favorable for solving ill-conditioned problems.

artificial intelligence, optimization problem, terminate, (16 more...)

1511.07837

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

Wang, Yu-Xiang, Sadhanala, Veeranjaneyulu, Dai, Wei, Neiswanger, Willie, Sra, Suvrit, Xing, Eric P.

We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. Whenever possible, we perform computations asynchronously, which helps attain speedups on multicore machines as well as in distributed environments. Moreover, instead of worst-case bounded delays, our methods only depend (mildly) on \emph{expected} delays, allowing them to be robust to stragglers and faulty worker threads. Our algorithms assume block-separable constraints, and subsume the recent Block-Coordinate Frank-Wolfe (BCFW) method~\citep{lacoste2013block}. Our analysis reveals problem-dependent quantities that govern the speedups of our methods over BCFW. We present experiments on structural SVM and Group Fused Lasso, obtaining significant speedups over competing state-of-the-art (and synchronous) methods.

algorithm, artificial intelligence, machine learning, (17 more...)

1409.6086

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Cvetkovski, Andrej, Crovella, Mark

Multidimensional Scaling in the Poincare Disk

Multidimensional scaling (MDS) is a class of projective algorithms traditionally used in Euclidean space to produce two- or three-dimensional visualizations of datasets of multidimensional points or point distances. More recently however, several authors have pointed out that for certain datasets, hyperbolic target space may provide a better fit than Euclidean space. In this paper we develop PD-MDS, a metric MDS algorithm designed specifically for the Poincare disk (PD) model of the hyperbolic plane. Emphasizing the importance of proceeding from first principles in spite of the availability of various black box optimizers, our construction is based on an elementary hyperbolic line search and reveals numerous particulars that need to be carefully addressed when implementing this as well as more sophisticated iterative optimization methods in a hyperbolic space model.

artificial intelligence, configuration, optimization problem, (16 more...)

1105.5332

Country: North America > United States (0.47)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Benidis, Konstantinos, Sun, Ying, Babu, Prabhu, Palomar, Daniel P.

Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation

The problem of estimating sparse eigenvectors of a symmetric matrix attracts a lot of attention in many applications, especially those with high dimensional data set. While classical eigenvectors can be obtained as the solution of a maximization problem, existing approaches formulate this problem by adding a penalty term into the objective function that encourages a sparse solution. However, the resulting methods achieve sparsity at the expense of sacrificing the orthogonality property. In this paper, we develop a new method to estimate dominant sparse eigenvectors without trading off their orthogonality. The problem is highly non-convex and hard to handle. We apply the MM framework where we iteratively maximize a tight lower bound (surrogate function) of the objective function over the Stiefel manifold. The inner maximization problem turns out to be a rectangular Procrustes problem, which has a closed form solution. In addition, we propose a method to improve the covariance estimation problem when its underlying eigenvectors are known to be sparse. We use the eigenvalue decomposition of the covariance matrix to formulate an optimization problem where we impose sparsity on the corresponding eigenvectors. Numerical experiments show that the proposed eigenvector extraction algorithm matches or outperforms existing algorithms in terms of support recovery and explained variance, while the covariance estimation algorithms improve significantly the sample covariance estimator.

artificial intelligence, eigenvector, machine learning, (14 more...)

doi: 10.1109/TSP.2016.2605073

1602.03992

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Rebeschini, Patrick, Tatikonda, Sekhar

Scale-free network optimization: foundations and algorithms

We investigate the fundamental principles that drive the development of scalable algorithms for network optimization. Despite the significant amount of work on parallel and decentralized algorithms in the optimization community, the methods that have been proposed typically rely on strict separability assumptions for objective function and constraints. Beside sparsity, these methods typically do not exploit the strength of the interaction between variables in the system. We propose a notion of correlation in constrained optimization that is based on the sensitivity of the optimal solution upon perturbations of the constraints. We develop a general theory of sensitivity of optimizers the extends beyond the infinitesimal setting. We present instances in network optimization where the correlation decays exponentially fast with respect to the natural distance in the network, and we design algorithms that can exploit this decay to yield dimension-free optimization. Our results are the first of their kind, and open new possibilities in the theory of local algorithms.

algorithm, artificial intelligence, optimization problem, (15 more...)

1602.04227

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Shibagaki, Atsushi, Karasuyama, Masayuki, Hatano, Kohei, Takeuchi, Ichiro

Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling

arXiv.org Machine LearningFeb-8-2016

The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of non-active features/samples. So far, safe screening has been individually studied either for feature screening or for sample screening. In this paper, we introduce a new approach for safely screening features and samples simultaneously by alternatively iterating feature and sample screening steps. A significant advantage of considering them simultaneously rather than individually is that they have a synergy effect in the sense that the results of the previous safe feature screening can be exploited for improving the next safe sample screening performances, and vice-versa. We first theoretically investigate the synergy effect, and then illustrate the practical advantage through intensive numerical experiments for problems with large numbers of features and samples.

artificial intelligence, machine learning, screening, (20 more...)

1602.02485

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Rakhlin, Alexander, Sridharan, Karthik

BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

arXiv.org Machine LearningFeb-5-2016

We present efficient algorithms for the problem of contextual bandits with i.i.d. covariates, an arbitrary sequence of rewards, and an arbitrary class of policies. Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself is computationally hard, we extend the approach by employing multiplicative approximation algorithms for the ERM. The integrality gap of the relaxation only enters in the regret bound rather than the benchmark. Finally, we show that the adversarial version of the contextual bandit problem is learnable (and efficient) whenever the full-information supervised online learning problem has a non-trivial regret guarantee (and efficient).

artificial intelligence, data mining, machine learning, (20 more...)

1602.02196

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Raziperchikolaei, Ramin, Carreira-Perpiñán, Miguel Á.

Optimizing affinity-based binary hashing using auxiliary coordinates

arXiv.org Machine LearningFeb-4-2016

In supervised binary hashing, one wants to learn a function that maps a high-dimensional feature vector to a vector of binary codes, for application to fast image retrieval. This typically results in a difficult optimization problem, nonconvex and nonsmooth, because of the discrete variables involved. Much work has simply relaxed the problem during training, solving a continuous optimization, and truncating the codes a posteriori. This gives reasonable results but is quite suboptimal. Recent work has tried to optimize the objective directly over the binary codes and achieved better results, but the hash function was still learned a posteriori, which remains suboptimal. We propose a general framework for learning hash functions using affinity-based loss functions that uses auxiliary coordinates. This closes the loop and optimizes jointly over the hash functions and the binary codes so that they gradually match each other. The resulting algorithm can be seen as a corrected, iterated version of the procedure of optimizing first over the codes and then learning the hash function. Compared to this, our optimization is guaranteed to obtain better hash functions while being not much slower, as demonstrated experimentally in various supervised datasets. In addition, our framework facilitates the design of optimization algorithms for arbitrary types of loss and hash functions.

artificial intelligence, hash function, machine learning, (19 more...)

1501.05352

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningFeb-4-2016

Correntropy Maximization via ADMM - Application to Robust Hyperspectral Unmixing

Zhu, Fei, Halimi, Abderrahim, Honeine, Paul, Chen, Badong, Zheng, Nanning

In hyperspectral images, some spectral bands suffer from low signal-to-noise ratio due to noisy acquisition and atmospheric effects, thus requiring robust techniques for the unmixing problem. This paper presents a robust supervised spectral unmixing approach for hyperspectral images. The robustness is achieved by writing the unmixing problem as the maximization of the correntropy criterion subject to the most commonly used constraints. Two unmixing problems are derived: the first problem considers the fully-constrained unmixing, with both the non-negativity and sum-to-one constraints, while the second one deals with the non-negativity and the sparsity-promoting of the abundances. The corresponding optimization problems are solved efficiently using an alternating direction method of multipliers (ADMM) approach. Experiments on synthetic and real hyperspectral images validate the performance of the proposed algorithms for different scenarios, demonstrating that the correntropy-based unmixing is robust to outlier bands.

artificial intelligence, constraint, optimization problem, (17 more...)

doi: 10.1109/TGRS.2017.2696262

1602.01729

Country: Asia > China (0.29)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.38)