AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Non-stationary Stochastic Optimization with Local Spatial and Temporal Changes

arXiv.org Machine LearningAug-31-2017

We consider a non-stationary sequential stochastic optimization problem, in which the underlying cost functions change over time under a variation budget constraint. We propose an $L_{p,q}$-variation functional to quantify the change, which captures local spatial and temporal variations of the sequence of functions. Under the $L_{p,q}$-variation functional constraint, we derive both upper and matching lower regret bounds for smooth and strongly convex function sequences, which generalize previous results in (Besbes et al., 2015). Our results reveal some surprising phenomena under this general variation functional, such as the curse of dimensionality of the function domain. The key technical novelties in our analysis include an affinity lemma that characterizes the distance of the minimizers of two convex functions with bounded $L_p$ difference, and a cubic spline based construction that attains matching lower bounds.

artificial intelligence, machine learning, optimization, (18 more...)

arXiv.org Machine Learning

1708.0302

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

An inexact subsampled proximal Newton-type method for large-scale machine learning

Liu, Xuanqing, Hsieh, Cho-Jui, Lee, Jason D., Sun, Yuekai

arXiv.org Machine LearningAug-28-2017

We propose a fast proximal Newton-type algorithm for minimizing regularized finite sums that returns an $\epsilon$-suboptimal point in $\tilde{\mathcal{O}}(d(n + \sqrt{\kappa d})\log(\frac{1}{\epsilon}))$ FLOPS, where $n$ is number of samples, $d$ is feature dimension, and $\kappa$ is the condition number. As long as $n > d$, the proposed method is more efficient than state-of-the-art accelerated stochastic first-order methods for non-smooth regularizers which requires $\tilde{\mathcal{O}}(d(n + \sqrt{\kappa n})\log(\frac{1}{\epsilon}))$ FLOPS. The key idea is to form the subsampled Newton subproblem in a way that preserves the finite sum structure of the objective, thereby allowing us to leverage recent developments in stochastic first-order methods to solve the subproblem. Experimental results verify that the proposed algorithm outperforms previous algorithms for $\ell_1$-regularized logistic regression on real datasets.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1708.08552

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning

Yang, Zhixiong, Bajwa, Waheed U.

arXiv.org Machine LearningAug-27-2017

Distributed machine learning algorithms enable processing of datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional indeed happen in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional distributed learning tasks. In this paper, two variants of an algorithm termed Byzantine-resilient distributed coordinate descent (ByRDiE) are developed and analyzed that solve distributed learning problems in the presence of Byzantine failures. Theoretical analysis as well as numerical experiments presented in the paper highlight the usefulness of ByRDiE for high-dimensional distributed learning in the presence of Byzantine failures.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1708.08155

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry:

Information Technology (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

[R] New approach to Sparse Discriminant Analysis, looking for feedback! • r/MachineLearning

@machinelearnbotAug-25-2017, 12:30:15 GMT

I've been working (in collaboration with others) on this R package accSDA, (accelerated sparse discriminant analysis), which is now available on CRAN. The research behind this has mostly been application of alternative optimization approaches, which are considerably faster than the traditional approach (old version available as an R package sparseLDA). I am currently looking for feedback on the package. What problem does this package solve? This algorithm solves supervised classification problems with multiple classes, where the number of variables can be considerably larger than the number of samples, i.e. p n problems.

artificial intelligence, machine learning, social media, (9 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.37)

Add feedback

Convolutional Dictionary Learning: Acceleration and Convergence

Chun, Il Yong, Fessler, Jeffrey A.

arXiv.org Artificial IntelligenceAug-25-2017

Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.

algorithm, convergence, matrix, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIP.2017.2761545

1707.00389

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(13 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

Zheng, Shun, Wang, Jialei, Xia, Fen, Xu, Wei, Zhang, Tong

arXiv.org Machine LearningAug-24-2017

In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the "data parallelism" approach, where the aggregated training loss is minimized without moving data across machines. In this paper, we introduce a novel distributed dual formulation for regularized loss minimization problems that can directly handle data parallelism in the distributed setting. This formulation allows us to systematically derive dual coordinate optimization procedures, which we refer to as Distributed Alternating Dual Maximization (DADM). The framework extends earlier studies described in (Boyd et al., 2011; Ma et al., 2017; Jaggi et al., 2014; Yang, 2013) and has rigorous theoretical analyses. Moreover with the help of the new formulation, we develop the accelerated version of DADM (Acc-DADM) by generalizing the acceleration technique from (Shalev-Shwartz and Zhang, 2014) to the distributed setting. We also provide theoretical results for the proposed accelerated version and the new result improves previous ones (Yang, 2013; Ma et al., 2017) whose iteration complexities grow linearly on the condition number. Our empirical studies validate our theory and show that our accelerated approach significantly improves the previous state-of-the-art distributed dual coordinate optimization algorithms.

artificial intelligence, machine learning, shalev-shwartz and zhang, (15 more...)

arXiv.org Machine Learning

1604.03763

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Massively-Parallel Feature Selection for Big Data

Tsamardinos, Ioannis, Borboudakis, Giorgos, Katsogridakis, Pavlos, Pratikakis, Polyvios, Christophides, Vassilis

arXiv.org Machine LearningAug-23-2017

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of $p$-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1708.07178

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.99)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

A Deterministic Nonsmooth Frank Wolfe Algorithm with Coreset Guarantees

Ravi, Sathya N., Collins, Maxwell D., Singh, Vikas

arXiv.org Machine LearningAug-22-2017

We present a new Frank-Wolfe (FW) type algorithm that is applicable to minimization problems with a nonsmooth convex objective. We provide convergence bounds and show that the scheme yields so-called coreset results for various Machine Learning problems including 1-median, Balanced Development, Sparse PCA, Graph Cuts, and the $\ell_1$-norm-regularized Support Vector Machine (SVM) among others. This means that the algorithm provides approximate solutions to these problems in time complexity bounds that are not dependent on the size of the input problem. Our framework, motivated by a growing body of work on sublinear algorithms for various data analysis problems, is entirely deterministic and makes no use of smoothing or proximal operators. Apart from these theoretical results, we show experimentally that the algorithm is very practical and in some cases also offers significant computational advantages on large problem instances. We provide an open source implementation that can be adapted for other problems that fit the overall structure.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1708.06714

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Flexible Low-Rank Statistical Modeling with Side Information

Fithian, William, Mazumder, Rahul

arXiv.org Machine LearningAug-22-2017

We propose a general framework for reduced-rank modeling of matrix-valued data. By applying a generalized nuclear norm penalty we can directly model low-dimensional latent variables associated with rows and columns. Our framework flexibly incorporates row and column features, smoothing kernels, and other sources of side information by penalizing deviations from the row and column models. Moreover, a large class of these models can be estimated scalably using convex optimization. The computational bottleneck in each case is one singular value decomposition per iteration of a large but easy-to-apply matrix. Our framework generalizes traditional convex matrix completion and multi-task learning methods as well as maximum a posteriori estimation under a large class of popular hierarchical Bayesian models.

artificial intelligence, machine learning, matrix completion, (18 more...)

arXiv.org Machine Learning

1308.4211

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow

Wang, Gang, Giannakis, Georgios B., Eldar, Yonina C.

arXiv.org Machine LearningAug-20-2017

This paper presents a new algorithm, termed \emph{truncated amplitude flow} (TAF), to recover an unknown vector $\bm{x}$ from a system of quadratic equations of the form $y_i=|\langle\bm{a}_i,\bm{x}\rangle|^2$, where $\bm{a}_i$'s are given random measurement vectors. This problem is known to be \emph{NP-hard} in general. We prove that as soon as the number of equations is on the order of the number of unknowns, TAF recovers the solution exactly (up to a global unimodular constant) with high probability and complexity growing linearly with both the number of unknowns and the number of equations. Our TAF approach adopts the \emph{amplitude-based} empirical loss function, and proceeds in two stages. In the first stage, we introduce an \emph{orthogonality-promoting} initialization that can be obtained with a few power iterations. Stage two refines the initial estimate by successive updates of scalable \emph{truncated generalized gradient iterations}, which are able to handle the rather challenging nonconvex and nonsmooth amplitude-based objective function. In particular, when vectors $\bm{x}$ and $\bm{a}_i$'s are real-valued, our gradient truncation rule provably eliminates erroneously estimated signs with high probability to markedly improve upon its untruncated version. Numerical tests using synthetic data and real images demonstrate that our initialization returns more accurate and robust estimates relative to spectral initializations. Furthermore, even under the same initialization, the proposed amplitude-based refinement outperforms existing Wirtinger flow variants, corroborating the superior performance of TAF over state-of-the-art algorithms.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Machine Learning

1605.08285

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback