AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Some methods for heterogeneous treatment effect estimation in high-dimensions

Powers, Scott, Qian, Junyang, Jung, Kenneth, Schuler, Alejandro, Shah, Nigam H., Hastie, Trevor, Tibshirani, Robert

arXiv.org Machine LearningJul-1-2017

When devising a course of treatment for a patient, doctors often have little quantitative evidence on which to base their decisions, beyond their medical education and published clinical trials. Stanford Health Care alone has millions of electronic medical records (EMRs) that are only just recently being leveraged to inform better treatment recommendations. These data present a unique challenge because they are high-dimensional and observational. Our goal is to make personalized treatment recommendations based on the outcomes for past patients similar to a new patient. We propose and analyze three methods for estimating heterogeneous treatment effects using observational data. Our methods perform well in simulations using a wide variety of treatment effect functions, and we present results of applying the two most promising methods to data from The SPRINT Data Analysis Challenge, from a large randomized trial of a treatment for high blood pressure.

artificial intelligence, machine learning, treatment effect, (14 more...)

arXiv.org Machine Learning

1707.00102

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball

Powers, Scott, Hastie, Trevor, Tibshirani, Robert

arXiv.org Machine LearningJun-30-2017

We propose the nuclear norm penalty as an alternative to the ridge penalty for regularized multinomial regression. This convex relaxation of reduced-rank multinomial regression has the advantage of leveraging underlying structure among the response categories to make better predictions. We apply our method, nuclear penalized multinomial regression (NPMR), to Major League Baseball play-by-play data to predict outcome probabilities based on batter-pitcher matchups. The interpretation of the results meshes well with subject-area expertise and also suggests a novel understanding of what differentiates players.

artificial intelligence, machine learning, regression, (18 more...)

arXiv.org Machine Learning

1706.10272

Country:

North America (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Baseball (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Collaborative-controlled LASSO for Constructing Propensity Score-based Estimators in High-Dimensional Data

Ju, Cheng, Wyss, Richard, Franklin, Jessica M., Schneeweiss, Sebastian, Häggström, Jenny, van der Laan, Mark J.

arXiv.org Machine LearningJun-30-2017

Propensity score (PS) based estimators are increasingly used for causal inference in observational studies. However, model selection for PS estimation in high-dimensional data has received little attention. In these settings, PS models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collaborative minimum loss-based estimation (C-TMLE) is a novel methodology for causal inference that takes into account information on the causal parameter of interest when selecting a PS model. This "collaborative learning" considers variable associations with both treatment and outcome when selecting a PS model in order to minimize a bias-variance trade off in the estimated treatment effect. In this study, we introduce a novel approach for collaborative model selection when using the LASSO estimator for PS estimation in high-dimensional covariate settings. To demonstrate the importance of selecting the PS model collaboratively, we designed quasi-experiments based on a real electronic healthcare database, where only the potential outcomes were manually generated, and the treatment and baseline covariates remained unchanged. Results showed that the C-TMLE algorithm outperformed other competing estimators for both point estimation and confidence interval coverage. In addition, the PS model selected by C-TMLE could be applied to other PS-based estimators, which also resulted in substantive improvement for both point estimation and confidence interval coverage. We illustrate the discussed concepts through an empirical example comparing the effects of non-selective nonsteroidal anti-inflammatory drugs with selective COX-2 inhibitors on gastrointestinal complications in a population of Medicare beneficiaries.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1706.10029

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.54)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Likelihood Inflating Sampling Algorithm

Entezari, Reihaneh, Craiu, Radu V., Rosenthal, Jeffrey S.

arXiv.org Machine LearningJun-30-2017

Markov Chain Monte Carlo (MCMC) sampling from a posterior distribution corresponding to a massive data set can be computationally prohibitive since producing one sample requires a number of operations that is linear in the data size. In this paper, we introduce a new communication-free parallel method, the Likelihood Inflating Sampling Algorithm (LISA), that significantly reduces computational costs by randomly splitting the dataset into smaller subsets and running MCMC methods independently in parallel on each subset using different processors. Each processor will be used to run an MCMC chain that samples sub-posterior distributions which are defined using an "inflated" likelihood function. We develop a strategy for combining the draws from different sub-posteriors to study the full posterior of the Bayesian Additive Regression Trees (BART) model. The performance of the method is tested using both simulated and real data.

artificial intelligence, machine learning, modlisa, (18 more...)

arXiv.org Machine Learning

1605.02113

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Getting Started with Orange 18: Text Classification

#artificialintelligenceJun-28-2017, 15:40:48 GMT

How to visualize logistic regression model, build classification workflow for text and predict tale type of unclassified tales.

machine learning, social media, text classification, (2 more...)

#artificialintelligence

Country: Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.25)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)
Information Technology > Communications > Social Media (0.76)

Add feedback

Asymptotic Confidence Regions for High-dimensional Structured Sparsity

Stucky, Benjamin, van de Geer, Sara

arXiv.org Machine LearningJun-28-2017

In the setting of high-dimensional linear regression models, we propose two frameworks for constructing pointwise and group confidence sets for penalized estimators which incorporate prior knowledge about the organization of the non-zero coefficients. This is done by desparsifying the estimator as in van de Geer et al. [18] and van de Geer and Stucky [17], then using an appropriate estimator for the precision matrix $\Theta$. In order to estimate the precision matrix a corresponding structured matrix norm penalty has to be introduced. After normalization the result is an asymptotic pivot. The asymptotic behavior is studied and simulations are added to study the differences between the two schemes.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1706.09231

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

What Top Firms Ask: 100 Data Science Interview Questions

@machinelearnbotJun-27-2017, 19:45:21 GMT

A fresh scrape from Glassdoor gives us a good idea about what applicants are asked during a data scientist interview at some of the top companies. Unfortunately for us, almost every company has their interviewees sign NDAs. Since Glassdoor allows anonymity, a few brave souls have given us some fantastic examples of what they were asked during the interview process at top companies like Facebook, Google, and Microsoft. If you find yourself unable to answer some of the questions below, consider checking out a course or a book on the subject. If you'd like to share your answer(s) to any of the questions, leave a comment and I'll add the top ones to the post.

artificial intelligence, data mining, machine learning, (16 more...)

@machinelearnbot

Country: North America > United States > New York (0.05)

Genre:

Personal > Interview (0.49)
Research Report (0.31)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Neural Networks as a Corporation Chain of Command

#artificialintelligenceJun-27-2017, 03:40:13 GMT

Neural networks are considered complicated and they are always explained using neurons and a brain function. But we do not need to learn how to brain works to understand Neural networks structure and how they operate. We can look as something people encounter in everyday life more often, like a corporation hierarchy. Let us start with logistic regression. The logistic regression yields values form 0 to 1, and we can consider the process as making a evaluation.

artificial intelligence, corporation chain, machine learning, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.81)

Add feedback

Distributed Coordinate Descent for Generalized Linear Models with Regularization

Trofimov, Ilya, Genkin, Alexander

arXiv.org Machine LearningJun-26-2017

Generalized linear model with $L_1$ and $L_2$ regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1611.02101

Genre: Research Report > Promising Solution (0.34)

Technology: