Goto

Collaborating Authors

 Regression


Gradient and Newton Boosting for Classification and Regression

arXiv.org Machine Learning

Boosting refers to a type of classification and regression algorithms that enjoy large popularity due to their excellent predictive accuracy on a wide range of datasets. The first boosting algorithms for classification, including the well known AdaBoost algorithm, were introduced by Schapire [1990], Freund and Schapire [1995], and Freund et al. [1996]. Later, several authors [Breiman, 1998, 1999, Friedman et al., 2000, Mason et al., 2000, Friedman, 2001] introduced the statistical view of boosting as a stagewise optimization approach. In particular, Friedman et al. [2000] first introduced boosting algorithms which iteratively optimize Bernoulli and multinomial likelihoods for binary and multiclass classification using Newton updates. Further, Friedman [2001] presented gradient descent based boosting algorithms for both regression and classification tasks with general loss functions.


BooST: Boosting Smooth Trees for Partial Effect Estimation in Nonlinear Regressions

arXiv.org Machine Learning

In this paper we introduce a new machine learning (ML) model for nonlinear regression called Boosting Smooth Transition Regression Tree (BooST). The main advantage of the BooST is that it estimates the derivatives (partial effects) of very general nonlinear models, providing more interpretation than other tree based models concerning the mapping between the covariates and the dependent variable. We provide some asymptotic theory that shows consistency of the partial derivatives and we present some examples on simulated and empirical data.


Opioid prescribing decreases after learning of a patients fatal overdose

Science

This database provided a comprehensive record of opioids dispensed at California pharmacies to civilian, nonโ€“U.S. Department of Veterans Affairs, and non-institutionalized patients treated by clinicians in our sample. Descriptive and inferential statistics were carried out with the Stata software (6). The cmp command in Stata was used to compute a difference-in-differences estimator within a mixed-model two-part linear regression analysis (7). The difference-in-differences estimator compared the average change over time in milligram morphine equivalents (MMEs) dispensed for prescribers in the intervention group with the average change over time for prescribers in the control group.


Using Apache Ignite's Machine Learning for Fraud Detection at Scale - DZone AI

#artificialintelligence

Our initial results look promising, but there is room for improvement. We made a number of choices and assumptions for our initial analysis. Our next steps would be to go back and evaluate these to determine what changes we can make to tune our classifier. If we plan to use this classifier for a real-time credit card fraud detection system, we want to ensure that we can catch all the fraudulent transactions and also keep our customers happy by correctly identifying non-fraudulent transactions. Once we have a good classifier, we can use it directly with transactions arriving into Ignite in real-time. Additionally, with Ignite's continuous learning capabilities, we can refine and tune our classifier further with new data, as the data arrive. Finally, using Ignite as the basis for a real-time fraud detection system enables us to obtain many advantages, such as the ability to scale ML processing beyond a single node, the storage and manipulation of massive quantities of data, and zero ETL.


Data-driven polynomial chaos expansion for machine learning regression

arXiv.org Machine Learning

We present a regression technique for data driven problems based on polynomial chaos expansion (PCE). PCE is a popular technique in the field of uncertainty quantification (UQ), where it is typically used to replace a runnable but expensive computational model subject to random inputs with an inexpensive-to-evaluate polynomial function. The metamodel obtained enables a reliable estimation of the statistics of the output, provided that a suitable probabilistic model of the input is available. In classical machine learning (ML) regression settings, however, the system is only known through observations of its inputs and output, and the interest lies in obtaining accurate pointwise predictions of the latter. Here, we show that a PCE metamodel purely trained on data can yield pointwise predictions whose accuracy is comparable to that of other ML regression models, such as neural networks and support vector machines. The comparisons are performed on benchmark datasets available from the literature. The methodology also enables the quantification of the output uncertainties and is robust to noise. Furthermore, it enjoys additional desirable properties, such as good performance for small training sets and simplicity of construction, with only little parameter tuning required. In the presence of statistically dependent inputs, we investigate two ways to build the PCE, and show through simulations that one approach is superior to the other in the stated settings.


Active Learning for Regression Using Greedy Sampling

arXiv.org Machine Learning

Regression problems are pervasive in real-world applications. Generally a substantial amount of labeled samples are needed to build a regression model with good generalization ability. However, many times it is relatively easy to collect a large number of unlabeled samples, but time-consuming or expensive to label them. Active learning for regression (ALR) is a methodology to reduce the number of labeled samples, by selecting the most beneficial ones to label, instead of random selection. This paper proposes two new ALR approaches based on greedy sampling (GS). The first approach (GSy) selects new samples to increase the diversity in the output space, and the second (iGS) selects new samples to increase the diversity in both input and output spaces. Extensive experiments on 12 UCI and CMU StatLib datasets from various domains, and on 15 subjects on EEG-based driver drowsiness estimation, verified their effectiveness and robustness.


Affect Estimation in 3D Space Using Multi-Task Active Learning for Regression

arXiv.org Machine Learning

Acquisition of labeled training samples for affective computing is usually costly and time-consuming, as affects are intrinsically subjective, subtle and uncertain, and hence multiple human assessors are needed to evaluate each affective sample. Particularly, for affect estimation in the 3D space of valence, arousal and dominance, each assessor has to perform the evaluations in three dimensions, which makes the labeling problem even more challenging. Many sophisticated machine learning approaches have been proposed to reduce the data labeling requirement in various other domains, but so far few have considered affective computing. This paper proposes two multi-task active learning for regression approaches, which select the most beneficial samples to label, by considering the three affect primitives simultaneously. Experimental results on the VAM corpus demonstrated that our optimal sample selection approaches can result in better estimation performance than random selection and several traditional single-task active learning approaches. Thus, they can help alleviate the data labeling problem in affective computing, i.e., better estimation performance can be obtained from fewer labeling queries.


Optimal stopping via deeply boosted backward regression

arXiv.org Machine Learning

In this note we propose a new approach towards solving numerically optimal stopping problems via boosted regression based Monte Carlo algorithms. The main idea of the method is to boost standard linear regression algorithms in each backward induction step by adding new basis functions based on previously estimated continuation values. The proposed methodology is illustrated by several numerical examples from finance.


Unbiased Implicit Variational Inference

arXiv.org Machine Learning

We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family. UIVI considers an implicit variational distribution obtained in a hierarchical manner using a simple reparameterizable distribution whose variational parameters are defined by arbitrarily flexible deep neural networks. Unlike previous works, UIVI directly optimizes the evidence lower bound (ELBO) rather than an approximation to the ELBO. We demonstrate UIVI on several models, including Bayesian multinomial logistic regression and variational autoencoders, and show that UIVI achieves both tighter ELBO and better predictive performance than existing approaches at a similar computational cost.


Improved survival of cancer patients admitted to the ICU between 2002 and 2011 at a U.S. teaching hospital

arXiv.org Machine Learning

Over the past decades, both critical care and cancer care have improved substantially. Due to increased cancer-specific survival, we hypothesized that both the number of cancer patients admitted to the ICU and overall survival have increased since the millennium change. MIMIC-III, a freely accessible critical care database of Beth Israel Deaconess Medical Center, Boston, USA was used to retrospectively study trends and outcomes of cancer patients admitted to the ICU between 2002 and 2011. Multiple logistic regression analysis was performed to adjust for confounders of 28-day and 1-year mortality. Out of 41,468 unique ICU admissions, 1,100 hemato-oncologic, 3,953 oncologic and 49 patients with both a hematological and solid malignancy were analyzed. Hematological patients had higher critical illness scores than non-cancer patients, while oncologic patients had similar APACHE-III and SOFA-scores compared to non-cancer patients. In the univariate analysis, cancer was strongly associated with mortality (OR= 2.74, 95%CI: 2.56, 2.94). Over the 10-year study period, 28-day mortality of cancer patients decreased by 30%. This trend persisted after adjustment for covariates, with cancer patients having significantly higher mortality (OR=2.63, 95%CI: 2.38, 2.88). Between 2002 and 2011, both the adjusted odds of 28-day mortality and the adjusted odds of 1-year mortality for cancer patients decreased by 6% (95%CI: 4%, 9%). Having cancer was the strongest single predictor of 1-year mortality in the multivariate model (OR=4.47, 95%CI: 4.11, 4.84).