Goto

Collaborating Authors

 Regression


A Literature Review on Length of Stay Prediction for Stroke Patients using Machine Learning and Statistical Approaches

arXiv.org Artificial Intelligence

Hospital length of stay (LOS) is one of the most essential healthcare metrics that reflects the hospital quality of service and helps improve hospital scheduling and management. LOS prediction helps in cost management because patients who remain in hospitals usually do so in hospital units where resources are severely limited. In this study, we reviewed papers on LOS prediction using machine learning and statistical approaches. Our literature review considers research studies that focus on LOS prediction for stroke patients. Some of the surveyed studies revealed that authors reached contradicting conclusions. For example, the age of the patient was considered an important predictor of LOS for stroke patients in some studies, while other studies concluded that age was not a significant factor. Therefore, additional research is required in this domain to further understand the predictors of LOS for stroke patients. There are many challenges associated with growth in the healthcare sector, including increased pressure on the limited resources of hospitals. This issue has motivated researchers to conduct further research related to hospital resource optimization. Since hospitalization constitutes a significant cost of patient care, many researchers have been investigating the problem of patient Length of Stay (LOS) prediction.


Multivariate Trend Filtering for Lattice Data

arXiv.org Machine Learning

We study a multivariate version of trend filtering, called Kronecker trend filtering or KTF, for the case in which the design points form a lattice in $d$ dimensions. KTF is a natural extension of univariate trend filtering (Steidl et al., 2006; Kim et al., 2009; Tibshirani, 2014), and is defined by minimizing a penalized least squares problem whose penalty term sums the absolute (higher-order) differences of the parameter to be estimated along each of the coordinate directions. The corresponding penalty operator can be written in terms of Kronecker products of univariate trend filtering penalty operators, hence the name Kronecker trend filtering. Equivalently, one can view KTF in terms of an $\ell_1$-penalized basis regression problem where the basis functions are tensor products of falling factorial functions, a piecewise polynomial (discrete spline) basis that underlies univariate trend filtering. This paper is a unification and extension of the results in Sadhanala et al. (2016, 2017). We develop a complete set of theoretical results that describe the behavior of $k^{\mathrm{th}}$ order Kronecker trend filtering in $d$ dimensions, for every $k \geq 0$ and $d \geq 1$. This reveals a number of interesting phenomena, including the dominance of KTF over linear smoothers in estimating heterogeneously smooth functions, and a phase transition at $d=2(k+1)$, a boundary past which (on the high dimension-to-smoothness side) linear smoothers fail to be consistent entirely. We also leverage recent results on discrete splines from Tibshirani (2020), in particular, discrete spline interpolation results that enable us to extend the KTF estimate to any off-lattice location in constant-time (independent of the size of the lattice $n$).


Omitted Variable Bias in Machine Learned Causal Models

arXiv.org Machine Learning

We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution -- all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects in partially linear models, or in nonseparable models with a binary treatment) the bound is shown to depend on two easily interpretable quantities: the nonparametric partial $R^2$ (Pearson's "correlation ratio") of the unobserved variables with the treatment and with the outcome. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Finally, leveraging debiased machine learning, we provide flexible and efficient statistical inference methods to estimate the components of the bounds that are identifiable from the observed distribution.


Bounding Wasserstein distance with couplings

arXiv.org Machine Learning

Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity. However, in large data applications, MCMC can be computationally expensive per iteration. This has catalyzed interest in sampling methods such as approximate MCMC, which trade off asymptotic consistency for improved computational speed. In this article, we propose estimators based on couplings of Markov chains to assess the quality of such asymptotically biased sampling methods. The estimators give empirical upper bounds of the Wassertein distance between the limiting distribution of the asymptotically biased sampling method and the original target distribution of interest. We establish theoretical guarantees for our upper bounds and show that our estimators can remain effective in high dimensions. We apply our quality measures to stochastic gradient MCMC, variational Bayes, and Laplace approximations for tall data and to approximate MCMC for Bayesian logistic regression in 4500 dimensions and Bayesian linear regression in 50000 dimensions.


Learning Across Bandits in High Dimension via Robust Statistics

arXiv.org Machine Learning

Decision-makers often face the "many bandits" problem, where one must simultaneously learn across related but heterogeneous contextual bandit instances. For instance, a large retailer may wish to dynamically learn product demand across many stores to solve pricing or inventory problems, making it desirable to learn jointly for stores serving similar customers; alternatively, a hospital network may wish to dynamically learn patient risk across many providers to allocate personalized interventions, making it desirable to learn jointly for hospitals serving similar patient populations. We study the setting where the unknown parameter in each bandit instance can be decomposed into a global parameter plus a sparse instance-specific term. Then, we propose a novel two-stage estimator that exploits this structure in a sample-efficient way by using a combination of robust statistics (to learn across similar instances) and LASSO regression (to debias the results). We embed this estimator within a bandit algorithm, and prove that it improves asymptotic regret bounds in the context dimension $d$; this improvement is exponential for data-poor instances. We further demonstrate how our results depend on the underlying network structure of bandit instances.


Improving Nonparametric Classification via Local Radial Regression with an Application to Stock Prediction

arXiv.org Artificial Intelligence

For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such bias, local polynomial regression (LPoR) and multiscale $k$-NN (MS-$k$-NN) learn the bias term by local regression around the query and extrapolate it to the query itself. However, their theoretical optimality has been shown for the limit of the infinite number of training samples. For correcting the asymptotic bias with fewer observations, this paper proposes a local radial regression (LRR) and its logistic regression variant called local radial logistic regression (LRLR), by combining the advantages of LPoR and MS-$k$-NN. The idea is simple: we fit the local regression to observed labels by taking the radial distance as the explanatory variable and then extrapolate the estimated label probability to zero distance. Our numerical experiments, including real-world datasets of daily stock indices, demonstrate that LRLR outperforms LPoR and MS-$k$-NN.


#003A Logistic Regression โ€“ Cost Function Optimization - Master Data Science

#artificialintelligence

First, to train parameters \(w \) and \(b \) of a logistic regression model we need to define a cost function. Given a training set of \(m\) training examples, we want to find parameters \(w\) and \(b \), so that \(\hat{y}\) is as close to \(y \) (ground truth). Here, we will use \((i) \) superscript to index different training examples. Henceforth, we will use loss (error) function \(\mathcal{L}\) to measure how well our algorithm is doing. In logistic regression squared error loss function is not an optimal choice.


[100%OFF] Machine Learning & Deep Learning in Python & R

#artificialintelligence

Learn how to solve real life problem using the Machine learning techniques Machine Learning models such as Linear Regression, Logistic Regression, KNN etc. Advanced Machine Learning models such as Decision trees, XGBoost, Random Forest, SVM etc. Understanding of basics of statistics and concepts of Machine Learning How to do basic statistical operations and run ML models in Python Indepth knowledge of data collection and data preprocessing for Machine Learning problem How to convert business problem into a Machine learning problem Can I get a certificate after completing the course? Are there any other coupons available for this course? Note: 100% OFF Udemy coupon codes are valid for maximum 3 days only. Look for "ENROLL NOW" button at the end of the post. Disclosure: This post may contain affiliate links and we may get small commission if you make a purchase.


10 Best Statistics Courses on Coursera

#artificialintelligence

This specialization program is especially dedicated to statistics. In this program, you will learn basic and intermediate concepts of statistical analysis using the Python programming language. In this program, you will learn the following topics- where data come from, what types of data can be collected, study data design, data management, and how to effectively carry out data exploration and visualization. Along with that, you will work on a variety of assignments that will help you to check your knowledge and ability. This specialization program is a 3-course series. Let's see the details of the courses-


#006A Fast Logistic Regression - Master Data Science

#artificialintelligence

When we are programming Logistic Regression or Neural Networks we should avoid explicit \(for \) loops. It's not always possible, but when we can, we should use built-in functions or find some other ways to compute it. Vectorizing the implementation of Logistic Regression makes the code highly efficient. In this post we will see how we can use this technique to compute gradient descent without using even a single \(for \) loop. This code was non-vectorized and highly inefficent so we need to transform it.