AITopics | Optimization

Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE given the whole dataset. We study this framework's statistical properties, showing that the efficiency loss compared to the global setting relates to how much the underlying distribution families deviate from full exponential families, drawing connection to the theory of information loss by Fisher, Rao and Efron. We show that the "full-exponential-family-ness" represents the lower bound of the error rate of arbitrary combinations of local MLEs, and is achieved by a KL-divergence-based combination method but not by a more common linear combination method. We also study the empirical properties of both methods, showing that the KL method significantly outperforms linear combination in practical settings with issues such as model misspecification, non-convexity, and heterogeneous data partitions.

artificial intelligence, machine learning, mle, (17 more...)

arXiv.org Machine Learning

1410.2653

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Methods and Models for Interpretable Linear Classification

Ustun, Berk, Rudin, Cynthia

arXiv.org Machine LearningOct-1-2014

We present an integer programming framework to build accurate and interpretable discrete linear classification models. Unlike existing approaches, our framework is designed to provide practitioners with the control and flexibility they need to tailor accurate and interpretable models for a domain of choice. To this end, our framework can produce models that are fully optimized for accuracy, by minimizing the 0--1 classification loss, and that address multiple aspects of interpretability, by incorporating a range of discrete constraints and penalty functions. We use our framework to produce models that are difficult to create with existing methods, such as scoring systems and M-of-N rule tables. In addition, we propose specially designed optimization methods to improve the scalability of our framework through decomposition and data reduction. We show that discrete linear classifiers can attain the training accuracy of any other linear classifier, and provide an Occam's Razor type argument as to why the use of small discrete coefficients can provide better generalization. We demonstrate the performance and flexibility of our framework through numerical experiments and a case study in which we construct a highly tailored clinical tool for sleep apnea diagnosis.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1405.4047

Country:

North America > United States (0.27)
Europe > United Kingdom > England (0.27)
Europe > Austria (0.27)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Energy and Uncertainty: Models and Algorithms for Complex Energy Systems

Powell, Warren (Princeton University)

AI MagazineSep-29-2014

The problem of controlling energy systems (generation, transmission, storage, investment) introduces a number of optimization problems which need to be solved in the presence of different types of uncertainty. We highlight several of these applications, using a simple energy storage problem as a case application. Using this setting, we describe a modeling framework based around five fundamental dimensions which is more natural than the standard canonical form widely used in the reinforcement learning community. The framework focuses on finding the best policy, where we identify four fundamental classes of policies consisting of policy function approximations (PFAs), cost function approximations (CFAs), policies based on value function approximations (VFAs), and lookahead policies.

artificial intelligence, optimization problem, uncertainty, (10 more...)

AI Magazine

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Add feedback

Energy and Uncertainty: Models and Algorithms for Complex Energy Systems

Powell, Warren (Princeton University)

AI MagazineSep-29-2014

The problem of controlling energy systems (generation, transmission, storage, investment) introduces a number of optimization problems which need to be solved in the presence of different types of uncertainty. We highlight several of these applications, using a simple energy storage problem as a case application. Using this setting, we describe a modeling framework based around five fundamental dimensions which is more natural than the standard canonical form widely used in the reinforcement learning community. The framework focuses on finding the best policy, where we identify four fundamental classes of policies consisting of policy function approximations (PFAs), cost function approximations (CFAs), policies based on value function approximations (VFAs), and lookahead policies. This organization unifies a number of competing strategies under a common umbrella.

approximation, artificial intelligence, optimization problem, (18 more...)

AI Magazine

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > New York (0.04)
(5 more...)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)
Energy > Energy Storage (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Communication-Efficient Distributed Dual Coordinate Ascent

Jaggi, Martin, Smith, Virginia, Takáč, Martin, Terhorst, Jonathan, Krishnan, Sanjay, Hofmann, Thomas, Jordan, Michael I.

arXiv.org Machine LearningSep-29-2014

Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In this paper, we propose a communication-efficient framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to state-of-the-art mini-batch versions of SGD and SDCA algorithms, CoCoA converges to the same .001-accurate solution quality on average 25x as quickly.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1409.1458

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization

Gillis, Nicolas, Vavasis, Stephen A.

arXiv.org Machine LearningSep-16-2014

Nonnegative matrix factorization (NMF) under the separability assumption can provably be solved efficiently, even in the presence of noise, and has been shown to be a powerful technique in document classification and hyperspectral unmixing. This problem is referred to as near-separable NMF and requires that there exists a cone spanned by a small subset of the columns of the input nonnegative matrix approximately containing all columns. In this paper, we propose a preconditioning based on semidefinite programming making the input matrix well-conditioned. This in turn can improve significantly the performance of near-separable NMF algorithms which is illustrated on the popular successive projection algorithm (SPA). The new preconditioned SPA is provably more robust to noise, and outperforms SPA on several synthetic data sets. We also show how an active-set method allow us to apply the preconditioning on large-scale real-world hyperspectral images.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1137/130940670

1310.2273

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Government > Space Agency (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Parallel Distributed Block Coordinate Descent Methods based on Pairwise Comparison Oracle

Matsui, Kota, Kumagai, Wataru, Kanamori, Takafumi

arXiv.org Machine LearningSep-13-2014

This paper provides a block coordinate descent algorithm to solve unconstrained optimization problems. In our algorithm, computation of function values or gradients is not required. Instead, pairwise comparison of function values is used. Our algorithm consists of two steps; one is the direction estimate step and the other is the search step. Both steps require only pairwise comparison of function values, which tells us only the order of function values over two points. In the direction estimate step, a Newton type search direction is estimated. A computation method like block coordinate descent methods is used with the pairwise comparison. In the search step, a numerical solution is updated along the estimated direction. The computation in the direction estimate step can be easily parallelized, and thus, the algorithm works efficiently to find the minimizer of the objective function. Also, we show an upper bound of the convergence rate. In numerical experiments, we show that our method efficiently finds the optimal solution compared to some existing methods based on the pairwise comparison.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1409.3912

Country: Europe > Austria (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Simulating Non Stationary Operators in Search Algorithms

Goëffon, Adrien, Lardeux, Frédéric, Saubion, Frédéric

arXiv.org Artificial IntelligenceSep-5-2014

In this paper, we propose a model for simulating search operators whose behaviour often changes continuously during the search. In these scenarios, the performance of the operators decreases when they are applied. This is motivated by the fact that operators for optimization problems are often roughly classified into exploitation operators and exploration operators. Our simulation model is used to compare the different performances of operator selection policies and clearly identify their ability to adapt to such specific operators behaviours. The experimental study provides interesting results on the respective behaviours of operator selection policies when faced to such non stationary search scenarios. Keywords: Island Models, Adaptive Operator Selection 1. Introduction Selecting the most suitable operators in a search algorithm when solving optimization problems is an active research area (Eiben et al., 2007; Lobo et al., 2007). Given an optimization problem, a search algorithm mainly consists in applying basic solving operators -- heuristics -- in order to explore and exploit the search space for retrieving solutions.

algorithm, operator, wsize, (13 more...)

arXiv.org Artificial Intelligence

1409.1686

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.89)

Add feedback

Marginal Structured SVM with Hidden Variables

Ping, Wei, Liu, Qiang, Ihler, Alexander

arXiv.org Machine LearningSep-5-2014

In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables. MSSVM properly accounts for the uncertainty of hidden variables, and can significantly outperform the previously proposed latent structured SVM (LSSVM; Yu & Joachims (2009)) and other state-of-art methods, especially when that uncertainty is large. Our method also results in a smoother objective function, making gradient-based optimization of MSSVMs converge significantly faster than for LSSVMs. We also show that our method consistently outperforms hidden conditional random fields (HCRFs; Quattoni et al. (2007)) on both simulated and real-world datasets. Furthermore, we propose a unified framework that includes both our and several other existing methods as special cases, and provides insights into the comparison of different models in practice.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

1409.132

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
(2 more...)

Add feedback

Structured Low-Rank Matrix Factorization with Missing and Grossly Corrupted Observations

Shang, Fanhua, Liu, Yuanyuan, Tong, Hanghang, Cheng, James, Cheng, Hong

arXiv.org Machine LearningSep-3-2014

Recovering low-rank and sparse matrices from incomplete or corrupted observations is an important problem in machine learning, statistics, bioinformatics, computer vision, as well as signal and image processing. In theory, this problem can be solved by the natural convex joint/mixed relaxations (i.e., l_{1}-norm and trace norm) under certain conditions. However, all current provable algorithms suffer from superlinear per-iteration cost, which severely limits their applicability to large-scale problems. In this paper, we propose a scalable, provable structured low-rank matrix factorization method to recover low-rank and sparse matrices from missing and grossly corrupted data, i.e., robust matrix completion (RMC) problems, or incomplete and grossly corrupted measurements, i.e., compressive principal component pursuit (CPCP) problems. Specifically, we first present two small-scale matrix trace norm regularized bilinear structured factorization models for RMC and CPCP problems, in which repetitively calculating SVD of a large-scale matrix is replaced by updating two much smaller factor matrices. Then, we apply the alternating direction method of multipliers (ADMM) to efficiently solve the RMC problems. Finally, we provide the convergence analysis of our algorithm, and extend it to address general CPCP problems. Experimental results verified both the efficiency and effectiveness of our method compared with the state-of-the-art methods.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1409.1062

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology: