Goto

Collaborating Authors

 Accuracy


Improved prediction rule ensembling through model-based data generation

arXiv.org Machine Learning

Prediction rule ensembles (PRE) provide interpretable prediction models with relatively high accuracy.PRE obtain a large set of decision rules from a (boosted) decision tree ensemble, and achieves sparsitythrough application of Lasso-penalized regression. This article examines the use of surrogate modelsto improve performance of PRE, wherein the Lasso regression is trained with the help of a massivedataset generated by the (boosted) decision tree ensemble. This use of model-based data generationmay improve the stability and consistency of the Lasso step, thus leading to improved overallperformance. We propose two surrogacy approaches, and evaluate them on simulated and existingdatasets, in terms of sparsity and predictive accuracy. The results indicate that the use of surrogacymodels can substantially improve the sparsity of PRE, while retaining predictive accuracy, especiallythrough the use of a nested surrogacy approach.


4 Ways That Your Accurate Model May Not Be Good Enough

#artificialintelligence

When we were in school and were given a problem to solve, we usually stopped working on the problem as soon as we found the answer and we recorded that answer on our paper. This might be a fair approach for elementary school assignments, but that approach is not good in higher education or in life. Unfortunately, many people continue this learned behavior into adulthood, at the university and/or on their jobs. Consequently, these people miss new opportunities for learning, discovery, recognition, and advancement. In data science, we are trained to keep searching (at least, I hope that this is true for all of us) even after we find that first model from our data that appears to answer our business question accurately.


Bayesian Nonparametric Dimensionality Reduction of Categorical Data for Predicting Severity of COVID-19 in Pregnant Women

arXiv.org Artificial Intelligence

The coronavirus disease (COVID-19) has rapidly spread throughout the world and while pregnant women present the same adverse outcome rates, they are underrepresented in clinical research. We collected clinical data of 155 test-positive COVID-19 pregnant women at Stony Brook University Hospital. Many of these collected data are of multivariate categorical type, where the number of possible outcomes grows exponentially as the dimension of data increases. We modeled the data within the unsupervised Bayesian framework and mapped them into a lower-dimensional space using latent Gaussian processes. The latent features in the lower dimensional space were further used for predicting if a pregnant woman would be admitted to a hospital due to COVID-19 or would remain with mild symptoms. We compared the prediction accuracy with the dummy/one-hot encoding of categorical data and found that the latent Gaussian process had better accuracy.


DOODLER: Determining Out-Of-Distribution Likelihood from Encoder Reconstructions

arXiv.org Machine Learning

Deep Learning models possess two key traits that, in combination, make their use in the real world a risky prospect. One, they do not typically generalize well outside of the distribution for which they were trained, and two, they tend to exhibit confident behavior regardless of whether or not they are producing meaningful outputs. While Deep Learning possesses immense power to solve realistic, high-dimensional problems, these traits in concert make it difficult to have confidence in their real-world applications. To overcome this difficulty, the task of Out-Of-Distribution (OOD) Detection has been defined, to determine when a model has received an input from outside of the distribution for which it is trained to operate. This paper introduces and examines a novel methodology, DOODLER, for OOD Detection, which directly leverages the traits which result in its necessity. By training a Variational Auto-Encoder (VAE) on the same data as another Deep Learning model, the VAE learns to accurately reconstruct In-Distribution (ID) inputs, but not to reconstruct OOD inputs, meaning that its failure state can be used to perform OOD Detection. Unlike other work in the area, DOODLER requires only very weak assumptions about the existence of an OOD dataset, allowing for more realistic application. DOODLER also enables pixel-wise segmentations of input images by OOD likelihood, and experimental results show that it matches or outperforms methodologies that operate under the same constraints.


Introspective Robot Perception using Smoothed Predictions from Bayesian Neural Networks

arXiv.org Artificial Intelligence

This work focuses on improving uncertainty estimation in the field of object classification from RGB images and demonstrates its benefits in two robotic applications. We employ a Bayesian Neural Network (BNN), and evaluate two practical inference techniques to obtain better uncertainty estimates, namely Concrete Dropout (CDP) and Kronecker-factored Laplace Approximation (LAP). We show a performance increase using more reliable uncertainty estimates as unary potentials within a Conditional Random Field (CRF), which is able to incorporate contextual information as well. Furthermore, the obtained uncertainties are exploited to achieve domain adaptation in a semi-supervised manner, which requires less manual efforts in annotating data. We evaluate our approach on two public benchmark datasets that are relevant for robot perception tasks.


Assessing clinical utility of Machine Learning and Artificial Intelligence approaches to analyze speech recordings in Multiple Sclerosis: A Pilot Study

arXiv.org Artificial Intelligence

Background: An early diagnosis together with an accurate disease progression monitoring of multiple sclerosis is an important component of successful disease management. Prior studies have established that multiple sclerosis is correlated with speech discrepancies. Early research using objective acoustic measurements has discovered measurable dysarthria. Objective: To determine the potential clinical utility of machine learning and deep learning/AI approaches for the aiding of diagnosis, biomarker extraction and progression monitoring of multiple sclerosis using speech recordings. Methods: A corpus of 65 MS-positive and 66 healthy individuals reading the same text aloud was used for targeted acoustic feature extraction utilizing automatic phoneme segmentation. A series of binary classification models was trained, tuned, and evaluated regarding their Accuracy and area-under-curve. Results: The Random Forest model performed best, achieving an Accuracy of 0.82 on the validation dataset and an area-under-curve of 0.76 across 5 k-fold cycles on the training dataset. 5 out of 7 acoustic features were statistically significant. Conclusion: Machine learning and artificial intelligence in automatic analyses of voice recordings for aiding MS diagnosis and progression tracking seems promising. Further clinical validation of these methods and their mapping onto multiple sclerosis progression is needed, as well as a validating utility for English-speaking populations.


Drama at 'The View': COVID tests were 'false positives,' co-host reveals

FOX News

The'Outnumbered' panel reacts to Sunny Hostin and Ana Navarro being pulled from the set moments before the vice president was set to arrive Ana Navarro, one of two co-hosts who were pulled from ABC's "The View" live on air Friday due to positive COVID-19 tests, has since revealed the results that caused the chaos were false positives. Producers informed Navarro and Sunny Hostin in their earpieces halfway through Friday's broadcast that they would have to leave the Hot Topics table, leaving Joy Behar and Sara Haines to conduct the rest of the show on their own. The remaining hosts often struggled to kill time, at one point taking questions from the audience, but often not being able to hear the questions that were muffled by their masks. Friday's drama was even more pronounced considering Navarro and Hostin were pulled just as Vice President Kamala Harris was on her way to the studio for an in-person interview. Even though Harris made it to the building, producers explained her appearance would end up taking place remotely from a separate room out of precaution.


Johns Hopkins has developed a lung cancer blood test

#artificialintelligence

Powered by artificial intelligence, a new lung cancer blood test developed at Johns Hopkins, combined with other metrics, correctly identified 94% of cancer cases in almost 800 patients. The lung cancer blood test, published in Nature Communications, searches for tiny fragments of DNA released by the tumor cells. The AI looks for patterns in this shattered DNA, rather than looking for specific pieces of cancer DNA like other blood tests in development, New Atlas explained. Lung cancer kills the most people in the world, the authors note, "largely due to the late stage at diagnosis where treatments are less effective than at earlier stages" -- and lung cancer rates are increasing, worldwide. "We believe that a blood test, or'liquid biopsy,' for lung cancer could be a good way to enhance screening efforts, because it would be easy to do, broadly accessible, and cost-effective," study first author Dimitrios Mathios said. The DNA difference: Blood tests for cancer typically focus on finding pieces of mutated tumor DNA.


Anomalous Edge Detection in Edge Exchangeable Social Network Models

arXiv.org Machine Learning

This paper studies detecting anomalous edges in directed graphs that model social networks. We exploit edge exchangeability as a criterion for distinguishing anomalous edges from normal edges. Then we present an anomaly detector based on conformal prediction theory; this detector has a guaranteed upper bound for false positive rate. In numerical experiments, we show that the proposed algorithm achieves superior performance to baseline methods.


Intersectional Group Fairness in Machine Learning

#artificialintelligence

At the ML Fairness Summit, we welcomed Fiddler Data Scientist, Léa Genuit to discuss intersectional group fairness. As more companies adopt AI, more people question the impact AI creates on society, especially on algorithmic fairness. Instead, they hold a binary view of fairness, e.g., protected vs. unprotected groups. In the below blog, Lea covers the latest research in research on intersectional group fairness. Before explaining why, the first question should be how do you detect and mitigate bias in European models to avoid a bad experience?