Goto

Collaborating Authors

 Accuracy


Igeood: An Information Geometry Approach to Out-of-Distribution Detection

arXiv.org Machine Learning

Reliable out-of-distribution (OOD) detection is fundamental to implementing safer modern machine learning (ML) systems. By building on the geodesic (Fisher-Rao) distance between the underlying data distributions, our discriminator can combine confidence scores from the logits outputs and the learned features of a deep neural network. Deep neural networks (DNNs) reach the state-of-the-art in several classification tasks as they are known to generalize well on data with a distribution close to the training set. Whereas, in many practical applications, the training set does not reflect well enough the real-life environment (Quionero-Candela et al., 2009) which is often non-stationary and sometimes with unpredictable events. Therefore, matching the training scenario to reality can be impossible or too complex. The inability of machine learning (ML) models to adapt to non-stationary distributions could limit their adoption in mission-critical systems (e.g., autonomous devices, healthcare applications). Out-of-Distribution (OOD) or novelty detection is one of the main objectives in conceiving reliable ML systems (Amodei et al., 2016). A typical application is monitoring ML-based online services for periodically shifting distributions. However, tracking changes in the underlying data distribution is challenging as they contain unusual (irregular or unexpected) events and have large dimensions. For instance, relying on the intrinsic properties of ML models and their statistical behavior in the presence of in-distribution data is essential to identify OOD samples. Classic approaches to OOD detection consist of deriving metrics for detecting those abnormalities from the lens of ML models (e.g., softmax output, latent representations across layers), provided that often only a single test example is available. Furthermore, these metrics are subject to potential limitations inherent in practical scenarios depending on the level of access to information in the ML model, e.g., having access only to the last layer or to all intermediate layers. The baseline approach for OOD detection relies on the predictive uncertainty of DNNs.


Bootstrapping Labels via ___ Supervision & Human-In-The-Loop

#artificialintelligence

Most machine learning tutorials and papers assume the availability of training labels. This includes benchmark datasets such as OpenImages or SuperGLUE, or customer interaction behavior such as clicks or purchases. But what if labeled datasets are not available? We would have to collect them. Collecting training labels is a seldom discussed art.


The Yield Curve as a Recession Leading Indicator. An Application for Gradient Boosting and Random Forest

arXiv.org Machine Learning

Most representative decision tree ensemble methods have been used to examine the variable importance of Treasury term spreads to predict US economic recessions with a balance of generating rules for US economic recession detection. A strategy is proposed for training the classifiers with Treasury term spreads data and the results are compared in order to select the best model for interpretability. We also discuss the use of SHapley Additive exPlanations (SHAP) framework to understand US recession forecasts by analyzing feature importance. Consistently with the existing literature we find the most relevant Treasury term spreads for predicting US economic recession and a methodology for detecting relevant rules for economic recession detection. In this case, the most relevant term spread found is 3 month to 6 month, which is proposed to be monitored by economic authorities. Finally, the methodology detected rules with high lift on predicting economic recession that can be used by these entities for this propose. This latter result stands in contrast to a growing body of literature demonstrating that machine learning methods are useful for interpretation comparing many alternative algorithms and we discuss the interpretation for our result and propose further research lines aligned with this work.


More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize

arXiv.org Machine Learning

Of theories for why large-scale machine learning models generalize despite being vastly overparameterized, which of their assumptions are needed to capture the qualitative phenomena of generalization in the real world? On one hand, we find that most theoretical analyses fall short of capturing these qualitative phenomena even for kernel regression, when applied to kernels derived from large-scale neural networks (e.g., ResNet-50) and real data (e.g., CIFAR-100). On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings. To bolster this empirical finding, we prove that the GCV estimator converges to the generalization risk whenever a local random matrix law holds. Finally, we apply this random matrix theory lens to explain why pretrained representations generalize better as well as what factors govern scaling laws for kernel regression. Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice.


Classification from Positive and Biased Negative Data with Skewed Labeled Posterior Probability

arXiv.org Machine Learning

The binary classification problem has a situation where only biased data are observed in one of the classes. In this paper, we propose a new method to approach the positive and biased negative (PbN) classification problem, which is a weakly supervised learning method to learn a binary classifier from positive data and negative data with biased observations. We incorporate a method to correct the negative impact due to skewed confidence, which represents the posterior probability that the observed data are positive. This reduces the distortion of the posterior probability that the data are labeled, which is necessary for the empirical risk minimization of the PbN classification problem. We verified the effectiveness of the proposed method by numerical experiments and real data analysis.


Machine Learning Classification Bootcamp in Python

#artificialintelligence

Apply advanced machine learning models to perform sentiment analysis and classify customer reviews such as Amazon Alexa products reviews Understand the theory and intuition behind several machine learning algorithms such as K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Implement classification algorithms in Scikit-Learn for K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Build an e-mail spam classifier using Naive Bayes classification Technique Apply machine learning models to Healthcare applications such as Cancer and Kyphosis diseases classification Develop Models to predict customer behavior towards targeted Facebook Ads Classify data using K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Build an in-store feature to predict customer's size using their features Develop a fraud detection classifier using Machine Learning Techniques Master Python Seaborn library for statistical plots Understand the difference between Machine Learning, Deep Learning and Artificial Intelligence Perform feature engineering and clean your training and testing data to remove outliers Master Python and Scikit-Learn for Data Science and Machine Learning Learn to use Python Matplotlib library for data Plotting Build an in-store feature to predict customer's size using their features Are you ready to master Machine Learning techniques and Kick-off your career as a Data Scientist?! You came to the right place! Machine Learning skill is one of the top skills to acquire in 2019 with an average salary of over $114,000 in the United States according to PayScale! The total number of ML jobs over the past two years has grown around 600 percent and expected to grow even more by 2020. In this course, we are going to provide students with knowledge of key aspects of state-of-the-art classification techniques.


Filter Drug-induced Liver Injury Literature with Natural Language Processing and Ensemble Learning

arXiv.org Artificial Intelligence

Drug-induced liver injury (DILI) describes the adverse effects of drugs that damage liver. Life-threatening results including liver failure or death were also reported in severe DILI cases. Therefore, DILI-related events are strictly monitored for all approved drugs and the liver toxicity became important assessments for new drug candidates. These DILI-related reports are documented in hospital records, in clinical trial results, and also in research papers that contain preliminary in vitro and in vivo experiments. Conventionally, data extraction from previous publications relies heavily on resource-demanding manual labelling, which considerably decreased the efficiency of the information extraction process. The recent development of artificial intelligence, particularly, the rise of natural language processing (NLP) techniques, enabled the automatic processing of biomedical texts. In this study, based on around 28,000 papers (titles and abstracts) provided by the Critical Assessment of Massive Data Analysis (CAMDA) challenge, we benchmarked model performances on filtering out DILI literature. Among four word vectorization techniques, the model using term frequency-inverse document frequency (TF-IDF) and logistic regression outperformed others with an accuracy of 0.957 with our in-house test set. Furthermore, an ensemble model with similar overall performances was implemented and was fine-tuned to lower the false-negative cases to avoid neglecting potential DILI reports. The ensemble model achieved a high accuracy of 0.954 and an F1 score of 0.955 in the hold-out validation data provided by the CAMDA committee. Moreover, important words in positive/negative predictions were identified via model interpretation. Overall, the ensemble model reached satisfactory classification results, which can be further used by researchers to rapidly filter DILI-related literature.


Machine learning using longitudinal prescription and medical claims for the detection of nonalcoholic steatohepatitis (NASH)

arXiv.org Machine Learning

Objectives To develop and evaluate machine learning models to detect suspected undiagnosed nonalcoholic steatohepatitis (NASH) patients for diagnostic screening and clinical management. Methods In this retrospective observational noninterventional study using administrative medical claims data from 1,463,089 patients, gradient-boosted decision trees were trained to detect likely NASH patients from an at-risk patient population with a history of obesity, type 2 diabetes mellitus (T2DM), metabolic disorder, or nonalcoholic fatty liver (NAFL). Models were trained to detect likely NASH in all at-risk patients or in the subset without a prior NAFL diagnosis (non-NAFL at-risk patients). Models were trained and validated using retrospective medical claims data and assessed using area under precision recall and receiver operating characteristic curves (AUPRCs, AUROCs). Results The 6-month incidence of NASH in claims data was 1 per 1,437 at-risk patients and 1 per 2,127 non-NAFL at-risk patients. The model trained to detect NASH in all at-risk patients had an AUPRC of 0.0107 (95% CI 0.0104 - 0.011) and an AUROC of 0.84. At 10% recall, model precision was 4.3%, which is 60x above NASH incidence. The model trained to detect NASH in non-NAFL patients had an AUPRC of 0.003 (95% CI 0.0029 - 0.0031) and an AUROC of 0.78. At 10% recall, model precision was 1%, which is 20x above NASH incidence. Conclusion The low incidence of NASH in medical claims data corroborates the pattern of NASH underdiagnosis in clinical practice. Claims-based machine learning could facilitate the detection of probable NASH patients for diagnostic testing and disease management.


AI Software for Fracture Detection Gets FDA Clearance

#artificialintelligence

An emerging artificial intelligence (AI) software that reportedly reduces false negative rates for fractures by 29 percent has received FDA clearance. BoneView AI (Gleamer) detects fractures on X-rays, highlights regions of interest and submits them to radiologists for confirmation, according to the French company Gleamer. The company said the algorithm was designed to aid a variety of physicians who read X-rays in clinical practice. Noting that traumatic injuries account for one-third of visits to emergency rooms (ERs), Gleamer noted that errors with fracture interpretation, which are common during evening hours, can represent up to 24 percent of harmful diagnostic errors in the ER. The company said these errors may result from fatigue and non-expert reading of X-rays.


Joint Probability Estimation Using Tensor Decomposition and Dictionaries

arXiv.org Machine Learning

In this work, we study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals, under the assumption that the joint probability could be decomposed and approximated by a mixture of product densities/mass functions. The problem of estimating the joint probability density function (PDF) using semi-parametric techniques such as Gaussian Mixture Models (GMMs) is widely studied. However such techniques yield poor results when the underlying densities are mixtures of various other families of distributions such as Laplacian or generalized Gaussian, uniform, Cauchy, etc. Further, GMMs are not the best choice to estimate joint distributions which are hybrid in nature, i.e., some random variables are discrete while others are continuous. We present a novel approach for estimating the PDF using ideas from dictionary representations in signal processing coupled with low rank tensor decompositions. To the best our knowledge, this is the first work on estimating joint PDFs employing dictionaries alongside tensor decompositions. We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture. Our approach can naturally handle hybrid $N$-dimensional distributions. We test our approach on a variety of synthetic and real datasets to demonstrate its effectiveness in terms of better classification rates and lower error rates, when compared to state of the art estimators.