AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Homotopy Parametric Simplex Method for Sparse Learning

Pang, Haotian, Vanderbei, Robert, Liu, Han, Zhao, Tuo

arXiv.org Machine LearningNov-27-2017

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis. In this paper, we are interested in a broad class of sparse learning approaches formulated as linear programs parametrized by a {\em regularization factor}, and solve them by the parametric simplex method (PSM). Our parametric simplex method offers significant advantages over other competing methods: (1) PSM naturally obtains the complete solution path for all values of the regularization parameter; (2) PSM provides a high precision dual certificate stopping criterion; (3) PSM yields sparse solutions through very few iterations, and the solution sparsity significantly reduces the computational cost per iteration. Particularly, we demonstrate the superiority of PSM over various sparse learning approaches, including Dantzig selector for sparse linear regression, LAD-Lasso for sparse robust linear regression, CLIME for sparse precision matrix estimation, sparse differential network estimation, and sparse Linear Programming Discriminant (LPD) analysis. We then provide sufficient conditions under which PSM always outputs sparse solutions such that its computational performance can be significantly boosted. Thorough numerical experiments are provided to demonstrate the outstanding performance of the PSM method.

artificial intelligence, machine learning, simplex method, (18 more...)

arXiv.org Machine Learning

1704.01079

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

The 5 Phases of Every Machine Learning Project – Blog

#artificialintelligenceNov-26-2017, 22:05:06 GMT

Machine learning and predictive analytics are pervasive in our lives today. AI impacts nearly everything we do and interact with including retail and wholesale pricing, consumer habits and behaviors, marketing and advertising, politics, entertainment, sports, medicine, business logistics and planning, fraud and risk detection, airline and truck route planning, pricing strategy, gaming, AI speech recognition, AI image recognition, self-driving cars, and robotics. Yet whether you are creating a self-driving car, predicting customer churn, or cresting a product recommendation system, all machine learning projects follow the same process and the same five basic phases. Data is the new oil. It is quickly becoming the most valuable commodity in the world. Data is like oil because it fuels machine learning projects. Without data, there is no machine learning and no predictive analytics. And just like grades of oil, there are grades of data. Supreme data is like rocket fuel for machine learning models, and buyers pay a premium for it.

artificial intelligence, machine learning, prediction, (17 more...)

#artificialintelligence

Industry:

Leisure & Entertainment (0.90)
Media > Film (0.70)
Transportation > Ground > Road (0.69)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Add feedback

Getting started with Machine Learning in MS Excel using XLMiner

@machinelearnbotNov-24-2017, 14:00:11 GMT

Machine Learning is nothing but building a'machine' which'learns' from its experience. And, becomes better with experience – just like humans. We also learn from our experiences. Companies like Google, Facebook, Microsoft are using machine learning techniques at a larger scale. However, one common mis-conception people have is that they need to learn coding to start machine learning.

artificial intelligence, machine learning, xlminer, (13 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Computing the quality of the Laplace approximation

Dehaene, Guillaume P.

arXiv.org Machine LearningNov-24-2017

Bayesian inference requires approximation methods to become computable, but for most of them it is impossible to quantify how close the approximation is to the true posterior. In this work, we present a theorem upper-bounding the KL divergence between a log-concave target density $f\left(\boldsymbol{\theta}\right)$ and its Laplace approximation $g\left(\boldsymbol{\theta}\right)$. The bound we present is computable: on the classical logistic regression model, we find our bound to be almost exact as long as the dimensionality of the parameter space is high. The approach we followed in this work can be extended to other Gaussian approximations, as we will do in an extended version of this work, to be submitted to the Annals of Statistics. It will then become a critical tool for characterizing whether, for a given problem, a given Gaussian approximation is suitable, or whether a more precise alternative method should be used instead.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1711.08911

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

Global optimization for low-dimensional switching linear regression and bounded-error estimation

Lauer, Fabien

arXiv.org Machine LearningNov-23-2017

The paper provides global optimization algorithms for two particularly difficult nonconvex problems raised by hybrid system identification: switching linear regression and bounded-error estimation. While most works focus on local optimization heuristics without global optimality guarantees or with guarantees valid only under restrictive conditions, the proposed approach always yields a solution with a certificate of global optimality. This approach relies on a branch-and-bound strategy for which we devise lower bounds that can be efficiently computed. In order to obtain scalable algorithms with respect to the number of data, we directly optimize the model parameters in a continuous optimization setting without involving integer variables. Numerical experiments show that the proposed algorithms offer a higher accuracy than convex relaxations with a reasonable computational burden for hybrid system identification. In addition, we discuss how bounded-error estimation is related to robust estimation in the presence of outliers and exact recovery under sparse noise, for which we also obtain promising numerical results.

artificial intelligence, identification, machine learning, (19 more...)

arXiv.org Machine Learning

1707.05533

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.61)

Add feedback

Causal nearest neighbor rules for optimal treatment regimes

Zhou, Xin, Kosorok, Michael R.

arXiv.org Machine LearningNov-22-2017

The estimation of optimal treatment regimes is of considerable interest to precision medicine. In this work, we propose a causal $k$-nearest neighbor method to estimate the optimal treatment regime. The method roots in the framework of causal inference, and estimates the causal treatment effects within the nearest neighborhood. Although the method is simple, it possesses nice theoretical properties. We show that the causal $k$-nearest neighbor regime is universally consistent. That is, the causal $k$-nearest neighbor regime will eventually learn the optimal treatment regime as the sample size increases. We also establish its convergence rate. However, the causal $k$-nearest neighbor regime may suffer from the curse of dimensionality, i.e. performance deteriorates as dimensionality increases. To alleviate this problem, we develop an adaptive causal $k$-nearest neighbor method to perform metric selection and variable selection simultaneously. The performance of the proposed methods is illustrated in simulation studies and in an analysis of a chronic depression clinical trial.

artificial intelligence, machine learning, regime, (9 more...)

arXiv.org Machine Learning

1711.08451

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.42)

Add feedback

Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

Elenberg, Ethan R., Dimakis, Alexandros G., Feldman, Moran, Karbasi, Amin

arXiv.org Machine LearningNov-22-2017

In many machine learning applications, it is important to explain the predictions of a black-box classifier. For example, why does a deep neural network assign an image to a particular class? We cast interpretability of black-box classifiers as a combinatorial maximization problem and propose an efficient streaming algorithm to solve it subject to cardinality constraints. By extending ideas from Badanidiyuru et al. [2014], we provide a constant factor approximation guarantee for our algorithm in the case of random stream order and a weakly submodular objective function. This is the first such theoretical guarantee for this general class of functions, and we also show that no such algorithm exists for a worst case stream order. Our algorithm obtains similar explanations of Inception V3 predictions $10$ times faster than the state-of-the-art LIME framework of Ribeiro et al. [2016].

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1703.02647

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

On Faster Convergence of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization

Li, Xingguo, Zhao, Tuo, Arora, Raman, Liu, Han, Hong, Mingyi

arXiv.org Machine LearningNov-22-2017

The cyclic block coordinate descent-type (CBCD-type) methods, which performs iterative updates for a few coordinates (a block) simultaneously throughout the procedure, have shown remarkable computational performance for solving strongly convex minimization problems. Typical applications include many popular statistical machine learning methods such as elastic-net regression, ridge penalized logistic regression, and sparse additive regression. Existing optimization literature has shown that for strongly convex minimization, the CBCD-type methods attain iteration complexity of $\mathcal{O}(p\log(1/\epsilon))$, where $\epsilon$ is a pre-specified accuracy of the objective value, and $p$ is the number of blocks. However, such iteration complexity explicitly depends on $p$, and therefore is at least $p$ times worse than the complexity $\mathcal{O}(\log(1/\epsilon))$ of gradient descent (GD) methods. To bridge this theoretical gap, we propose an improved convergence analysis for the CBCD-type methods. In particular, we first show that for a family of quadratic minimization problems, the iteration complexity $\mathcal{O}(\log^2(p)\cdot\log(1/\epsilon))$ of the CBCD-type methods matches that of the GD methods in term of dependency on $p$, up to a $\log^2 p$ factor. Thus our complexity bounds are sharper than the existing bounds by at least a factor of $p/\log^2(p)$. We also provide a lower bound to confirm that our improved complexity bounds are tight (up to a $\log^2 (p)$ factor), under the assumption that the largest and smallest eigenvalues of the Hessian matrix do not scale with $p$. Finally, we generalize our analysis to other strongly convex minimization problems beyond quadratic ones.

artificial intelligence, iteration complexity, machine learning, (13 more...)

arXiv.org Machine Learning

1607.02793

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems

Yang, Lei

arXiv.org Machine LearningNov-21-2017

In this paper, we consider a class of possibly nonconvex, nonsmooth and non-Lipschitz optimization problems arising in many contemporary applications such as machine learning, variable selection and image processing. To solve this class of problems, we propose a proximal gradient method with extrapolation and line search (PGels). This method is developed based on a special potential function and successfully incorporates both extrapolation and non-monotone line search, which are two simple and efficient accelerating techniques for the proximal gradient method. Thanks to the line search, this method allows more flexibilities in choosing the extrapolation parameters and updates them adaptively at each iteration if a certain line search criterion is not satisfied. Moreover, with proper choices of parameters, our PGels reduces to many existing algorithms. We also show that, under some mild conditions, our line search criterion is well defined and any cluster point of the sequence generated by PGels is a stationary point of our problem. In addition, by assuming the Kurdyka-${\L}$ojasiewicz exponent of the objective in our problem, we further analyze the local convergence rate of two special cases of PGels, including the widely used non-monotone proximal gradient method as one case. Finally, we conduct some numerical experiments for solving the $\ell_1$ regularized logistic regression problem and the $\ell_{1\text{-}2}$ regularized least squares problem. Our numerical results illustrate the efficiency of PGels and show the potential advantage of combining two accelerating techniques.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1711.06831

Country:

Asia > China (0.28)
North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Crash Course in Machine Learning – IoT For All – Medium

#artificialintelligenceNov-20-2017, 08:35:33 GMT

When you type'machine learning' into Google News, the first link you see is a Forbes Magazine piece called "What's The Difference Between Machine Learning And Artificial Intelligence?" This article contained so many flowery, grandiose descriptions about ML and AI technology that I couldn't help but laugh. With all the nonsense used to describe machine learning (ML) and artificial intelligence (AI), it's time we do a deep dive into what these technologies actually do. First, we need to learn the difference between AI and ML. Fortunately, a fellow writer has already written an excellent explanation here.

artificial intelligence, default, machine learning, (17 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Industry: Marketing (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback