AITopics

1706.0909

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
Education (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Trofimov, Ilya, Genkin, Alexander

Distributed Coordinate Descent for Generalized Linear Models with Regularization

arXiv.org Machine LearningJun-26-2017

Generalized linear model with $L_1$ and $L_2$ regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.

algorithm, artificial intelligence, machine learning, (18 more...)

1611.02101

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

arXiv.org Machine LearningJun-25-2017

A Unified Analysis of Stochastic Optimization Methods Using Jump System Theory and Quadratic Constraints

Hu, Bin, Seiler, Peter, Rantzer, Anders

We develop a simple routine unifying the analysis of several important recently-developed stochastic optimization methods including SAGA, Finito, and stochastic dual coordinate ascent (SDCA). First, we show an intrinsic connection between stochastic optimization methods and dynamic jump systems, and propose a general jump system model for stochastic optimization methods. Our proposed model recovers SAGA, SDCA, Finito, and SAG as special cases. Then we combine jump system theory with several simple quadratic inequalities to derive sufficient conditions for convergence rate certifications of the proposed jump system model under various assumptions (with or without individual convexity, etc). The derived conditions are linear matrix inequalities (LMIs) whose sizes roughly scale with the size of the training set. We make use of the symmetry in the stochastic optimization methods and reduce these LMIs to some equivalent small LMIs whose sizes are at most 3 by 3. We solve these small LMIs to provide analytical proofs of new convergence rates for SAGA, Finito and SDCA (with or without individual convexity). We also explain why our proposed LMI fails in analyzing SAG. We reveal a key difference between SAG and other methods, and briefly discuss how to extend our LMI analysis for SAG. An advantage of our approach is that the proposed analysis can be automated for a large class of stochastic methods under various assumptions (with or without individual convexity, etc).

artificial intelligence, machine learning, optimization problem, (15 more...)

1706.08141

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > Sweden > Skåne County > Lund (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

#artificialintelligenceJun-22-2017, 02:35:14 GMT

The Use of Learning Curves for Machine Learning Diagnostic in Cloud Resource Optimization - FittedCloud

When we use the 1st, 2nd, and nth (with a large n) order polynomial regression, we may get the following results. As we can see, the 1st order model is too simple to fit the data, the 2nd order model has reasonable fitting, and the nth order model is very complicated and can fit every single sample very well. But the problem with the 3rd case is that machine learning is not about fitting the training data, but about generalizing beyond the training data. While it can achieve small training error, it will have large test error.

artificial intelligence, cloud resource optimization, machine learning diagnostic, (5 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Venkateswara, Hemanth, Lade, Prasanth, Lin, Binbin, Ye, Jieping, Panchanathan, Sethuraman

Efficient Approximate Solutions to Mutual Information Based Global Feature Selection

Mutual Information (MI) is often used for feature selection when developing classifier models. Estimating the MI for a subset of features is often intractable. We demonstrate, that under the assumptions of conditional independence, MI between a subset of features can be expressed as the Conditional Mutual Information (CMI) between pairs of features. But selecting features with the highest CMI turns out to be a hard combinatorial problem. In this work, we have applied two unique global methods, Truncated Power Method (TPower) and Low Rank Bilinear Approximation (LowRank), to solve the feature selection problem. These algorithms provide very good approximations to the NP-hard CMI based feature selection problem. We experimentally demonstrate the effectiveness of these procedures across multiple datasets and compare them with existing MI based global and iterative feature selection procedures.

artificial intelligence, feature selection, machine learning, (15 more...)

1706.07535

Country: North America > United States > Michigan (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Nowak, Alex, Villar, Soledad, Bandeira, Afonso S., Bruna, Joan

A Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks

Many inverse problems are formulated as optimization problems over certain appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting "algorithm" will provide good accuracy-complexity tradeoffs in the average sense. We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer.

algorithm, artificial intelligence, machine learning, (13 more...)

1706.0745

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)

Frank-Wolfe Optimization for Symmetric-NMF under Simplicial Constraint

Zhao, Han, Gordon, Geoff

We propose a Frank-Wolfe (FW) solver to optimize the symmetric nonnegative matrix factorization problem under a simplicial constraint. Compared with existing solutions, this algorithm is extremely simple to implement, and has almost no hyperparameters to be tuned. Building on the recent advances of FW algorithms in nonconvex optimization, we prove an $O(1/\varepsilon^2)$ convergence rate to stationary points, via a tight bound $\Theta(n^2)$ on the curvature constant. Numerical results demonstrate the effectiveness of our algorithm. As a side contribution, we construct a simple nonsmooth convex problem where the FW algorithm fails to converge to the optimum. This result raises an interesting question about necessary conditions of the success of the FW algorithm on convex problems.

algorithm, artificial intelligence, machine learning, (16 more...)

1706.06348

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Bhadra, Anindya, Datta, Jyotishka, Polson, Nicholas G., Willard, Brandon

Horseshoe Regularization for Feature Subset Selection

Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers.

artificial intelligence, bayesian inference, machine learning, (20 more...)

1702.074

Country: North America > United States > Arkansas (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)
Health & Medicine > Therapeutic Area > Hematology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Letham, Benjamin, Karrer, Brian, Ottoni, Guilherme, Bakshy, Eytan

Constrained Bayesian Optimization with Noisy Experiments

arXiv.org Machine LearningJun-21-2017

Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems, including Internet services. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for optimizing multiple continuous parameters for field experiments, but existing approaches degrade in performance when the noise level is high. We derive an exact expression for expected improvement under greedy batch optimization with noisy observations and noisy constraints, and develop a quasi-Monte Carlo approximation that allows it to be efficiently optimized. Experiments with synthetic functions show that optimization performance on noisy, constrained problems outperforms existing methods. We further demonstrate the effectiveness of the method with two real experiments conducted at Facebook: optimizing a production ranking system, and optimizing web server compiler flags.

artificial intelligence, machine learning, optimization, (19 more...)

1706.07094

Country: North America > United States (0.93)

Genre:

Research Report > Strength High (0.54)
Research Report > Experimental Study (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.69)

Liu, Sulin, Pan, Sinno Jialin, Ho, Qirong

Distributed Multi-Task Relationship Learning

arXiv.org Machine LearningJun-20-2017

Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to improve the generalization performance for each task. Traditionally, to perform multi-task learning, one needs to centralize data from all the tasks to a single machine. However, in many real-world applications, data of different tasks may be geo-distributed over different local machines. Due to heavy communication caused by transmitting the data and the issue of data privacy and security, it is impossible to send data of different task to a master machine to perform multi-task learning. Therefore, in this paper, we propose a distributed multi-task learning framework that simultaneously learns predictive models for each task as well as task relationships between tasks alternatingly in the parameter server paradigm. In our framework, we first offer a general dual form for a family of regularized multi-task relationship learning methods. Subsequently, we propose a communication-efficient primal-dual distributed optimization algorithm to solve the dual problem by carefully designing local subproblems to make the dual problem decomposable. Moreover, we provide a theoretical convergence analysis for the proposed algorithm, which is specific for distributed multi-task relationship learning. We conduct extensive experiments on both synthetic and real-world datasets to evaluate our proposed framework in terms of effectiveness and convergence.

artificial intelligence, dmtrl, machine learning, (15 more...)

1612.04022

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)