Goto

Collaborating Authors

 Performance Analysis


Machine learning using longitudinal prescription and medical claims for the detection of nonalcoholic steatohepatitis (NASH)

arXiv.org Machine Learning

Objectives To develop and evaluate machine learning models to detect suspected undiagnosed nonalcoholic steatohepatitis (NASH) patients for diagnostic screening and clinical management. Methods In this retrospective observational noninterventional study using administrative medical claims data from 1,463,089 patients, gradient-boosted decision trees were trained to detect likely NASH patients from an at-risk patient population with a history of obesity, type 2 diabetes mellitus (T2DM), metabolic disorder, or nonalcoholic fatty liver (NAFL). Models were trained to detect likely NASH in all at-risk patients or in the subset without a prior NAFL diagnosis (non-NAFL at-risk patients). Models were trained and validated using retrospective medical claims data and assessed using area under precision recall and receiver operating characteristic curves (AUPRCs, AUROCs). Results The 6-month incidence of NASH in claims data was 1 per 1,437 at-risk patients and 1 per 2,127 non-NAFL at-risk patients. The model trained to detect NASH in all at-risk patients had an AUPRC of 0.0107 (95% CI 0.0104 - 0.011) and an AUROC of 0.84. At 10% recall, model precision was 4.3%, which is 60x above NASH incidence. The model trained to detect NASH in non-NAFL patients had an AUPRC of 0.003 (95% CI 0.0029 - 0.0031) and an AUROC of 0.78. At 10% recall, model precision was 1%, which is 20x above NASH incidence. Conclusion The low incidence of NASH in medical claims data corroborates the pattern of NASH underdiagnosis in clinical practice. Claims-based machine learning could facilitate the detection of probable NASH patients for diagnostic testing and disease management.


AI Software for Fracture Detection Gets FDA Clearance

#artificialintelligence

An emerging artificial intelligence (AI) software that reportedly reduces false negative rates for fractures by 29 percent has received FDA clearance. BoneView AI (Gleamer) detects fractures on X-rays, highlights regions of interest and submits them to radiologists for confirmation, according to the French company Gleamer. The company said the algorithm was designed to aid a variety of physicians who read X-rays in clinical practice. Noting that traumatic injuries account for one-third of visits to emergency rooms (ERs), Gleamer noted that errors with fracture interpretation, which are common during evening hours, can represent up to 24 percent of harmful diagnostic errors in the ER. The company said these errors may result from fatigue and non-expert reading of X-rays.


Cross-Validation in Machine Learning: How to Do It Right - neptune.ai

#artificialintelligence

In machine learning (ML), generalization usually refers to the ability of an algorithm to be effective across various inputs. It means that the ML model does not encounter performance degradation on the new inputs from the same distribution of the training data. For human beings generalization is the most natural thing possible. We can classify on the fly. For example, we would definitely recognize a dog even if we didn't see this breed before. Nevertheless, it might be quite a challenge for an ML model.


Joint Probability Estimation Using Tensor Decomposition and Dictionaries

arXiv.org Machine Learning

In this work, we study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals, under the assumption that the joint probability could be decomposed and approximated by a mixture of product densities/mass functions. The problem of estimating the joint probability density function (PDF) using semi-parametric techniques such as Gaussian Mixture Models (GMMs) is widely studied. However such techniques yield poor results when the underlying densities are mixtures of various other families of distributions such as Laplacian or generalized Gaussian, uniform, Cauchy, etc. Further, GMMs are not the best choice to estimate joint distributions which are hybrid in nature, i.e., some random variables are discrete while others are continuous. We present a novel approach for estimating the PDF using ideas from dictionary representations in signal processing coupled with low rank tensor decompositions. To the best our knowledge, this is the first work on estimating joint PDFs employing dictionaries alongside tensor decompositions. We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture. Our approach can naturally handle hybrid $N$-dimensional distributions. We test our approach on a variety of synthetic and real datasets to demonstrate its effectiveness in terms of better classification rates and lower error rates, when compared to state of the art estimators.


Accelerated SGD for Non-Strongly-Convex Least Squares

arXiv.org Machine Learning

We consider stochastic approximation for the least squares regression problem in the non-strongly convex setting. We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem, as $O(d/t)$ while accelerating the forgetting of the initial conditions to $O(d/t^2)$. Our new algorithm is based on a simple modification of the accelerated gradient descent. We provide convergence results for both the averaged and the last iterate of the algorithm. In order to describe the tightness of these new bounds, we present a matching lower bound in the noiseless setting and thus show the optimality of our algorithm.


Local Constraint-Based Causal Discovery under Selection Bias

arXiv.org Machine Learning

We consider the problem of discovering causal relations from independence constraints selection bias in addition to confounding is present. While the seminal FCI algorithm is sound and complete in this setup, no criterion for the causal interpretation of its output under selection bias is presently known. We focus instead on local patterns of independence relations, where we find no sound method for only three variable that can include background knowledge. Y-Structure patterns (Mani et al., 2006; Mooij and Cremers, 2015) are shown to be sound in predicting causal relations from data under selection bias, where cycles may be present. We introduce a finite-sample scoring rule for Y-Structures that is shown to successfully predict causal relations in simulation experiments that include selection mechanisms. On real-world microarray data, we show that a Y-Structure variant performs well across different datasets, potentially circumventing spurious correlations due to selection bias.


0-1 Loss Function explanation

#artificialintelligence

You have correctly summarized the 0-1 loss function as effectively looking at accuracy. Your 1's become indicators for misclassified items, regardless of how they were misclassified. Since you have three 1's out of 10 items, your classification accuracy is 70%. If you change the weighting on the loss function, this interpretation doesn't apply anymore. For example, in disease classification, it might be more costly to miss a positive case of disease (false negative) than to falsely diagnose disease (false positive).


Cross Validation for Beginners

#artificialintelligence

While attempting to solve a ML problem, we do a train_test split. If this split is done randomly than it might be possible that some dataset might be completely present in test set and absent from training set or vice versa. This reduces the accuracy of model. So Cross Validation comes into picture. Cross-validation is a step in the process of building a machine learning model which helps us ensure that our models fit the data accurately and also ensures that we do not overfit.Cross-validation is dividing training data into a few parts.


Model-agnostic out-of-distribution detection using combined statistical tests

arXiv.org Machine Learning

We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they can be applied to any differentiable generative model. The idea is to combine a classical parametric test (Rao's score test) with the recently introduced typicality test. These two test statistics are both theoretically well-founded and exploit different sources of information based on the likelihood for the typicality test and its gradient for the score test. We show that combining them using Fisher's method overall leads to a more accurate out-of-distribution test. We also discuss the benefits of casting out-of-distribution detection as a statistical testing problem, noting in particular that false positive rate control can be valuable for practical out-of-distribution detection. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms without any assumptions on the out-distribution.


Now that computers connect us all, for better and worse, what's next?

#artificialintelligence

This article was written, edited and designed on laptop computers. Such foldable, transportable devices would have astounded computer scientists just a few decades ago, and seemed like sheer magic before that. The machines contain billions of tiny computing elements, running millions of lines of software instructions, collectively written by countless people across the globe. You click or tap or type or speak, and the result seamlessly appears on the screen. Computers were once so large they filled rooms. Now they're everywhere and invisible, embedded in watches, car engines, cameras, televisions and toys. They manage electrical grids, analyze scientific data and predict the weather. The modern world would be impossible without them. Scientists aim to make computers faster and programs more intelligent, while deploying technology in an ethical manner. Their efforts build on more than a century of innovation. In 1833, English mathematician Charles Babbage conceived a programmable machine that presaged today's computing architecture, featuring a "store" for holding numbers, a "mill" for operating on them, an instruction reader and a printer. This Analytical Engine also had logical functions like branching (if X, then Y).