Goto

Collaborating Authors

 Performance Analysis


High dimensional change-point detection: a complete graph approach

arXiv.org Machine Learning

The aim of online change-point detection is for a accurate, timely discovery of structural breaks. As data dimension outgrows the number of data in observation, online detection becomes challenging. Existing methods typically test only the change of mean, which omit the practical aspect of change of variance. We propose a complete graph-based, change-point detection algorithm to detect change of mean and variance from low to high-dimensional online data with a variable scanning window. Inspired by complete graph structure, we introduce graph-spanning ratios to map high-dimensional data into metrics, and then test statistically if a change of mean or change of variance occurs. Theoretical study shows that our approach has the desirable pivotal property and is powerful with prescribed error probabilities. We demonstrate that this framework outperforms other methods in terms of detection power. Our approach has high detection power with small and multiple scanning window, which allows timely detection of change-point in the online setting. Finally, we applied the method to financial data to detect change-points in S&P 500 stocks.


Machine Learning Reimagines the Building Blocks of Computing

#artificialintelligence

Like tiny gears inside a watch, algorithms execute well-defined tasks within more complicated programs. They're ubiquitous, and in part because of this, they've been painstakingly optimized over time. When a programmer needs to sort a list, for example, they'll reach for a standard "sort" algorithm that's been used for decades. Now researchers are taking a fresh look at traditional algorithms, using the branch of artificial intelligence known as machine learning. Their approach, called algorithms with predictions, takes advantage of the insights machine learning tools can provide into the data that traditional algorithms handle.


Igeood: An Information Geometry Approach to Out-of-Distribution Detection

arXiv.org Machine Learning

Reliable out-of-distribution (OOD) detection is fundamental to implementing safer modern machine learning (ML) systems. By building on the geodesic (Fisher-Rao) distance between the underlying data distributions, our discriminator can combine confidence scores from the logits outputs and the learned features of a deep neural network. Deep neural networks (DNNs) reach the state-of-the-art in several classification tasks as they are known to generalize well on data with a distribution close to the training set. Whereas, in many practical applications, the training set does not reflect well enough the real-life environment (Quionero-Candela et al., 2009) which is often non-stationary and sometimes with unpredictable events. Therefore, matching the training scenario to reality can be impossible or too complex. The inability of machine learning (ML) models to adapt to non-stationary distributions could limit their adoption in mission-critical systems (e.g., autonomous devices, healthcare applications). Out-of-Distribution (OOD) or novelty detection is one of the main objectives in conceiving reliable ML systems (Amodei et al., 2016). A typical application is monitoring ML-based online services for periodically shifting distributions. However, tracking changes in the underlying data distribution is challenging as they contain unusual (irregular or unexpected) events and have large dimensions. For instance, relying on the intrinsic properties of ML models and their statistical behavior in the presence of in-distribution data is essential to identify OOD samples. Classic approaches to OOD detection consist of deriving metrics for detecting those abnormalities from the lens of ML models (e.g., softmax output, latent representations across layers), provided that often only a single test example is available. Furthermore, these metrics are subject to potential limitations inherent in practical scenarios depending on the level of access to information in the ML model, e.g., having access only to the last layer or to all intermediate layers. The baseline approach for OOD detection relies on the predictive uncertainty of DNNs.


Bootstrapping Labels via ___ Supervision & Human-In-The-Loop

#artificialintelligence

Most machine learning tutorials and papers assume the availability of training labels. This includes benchmark datasets such as OpenImages or SuperGLUE, or customer interaction behavior such as clicks or purchases. But what if labeled datasets are not available? We would have to collect them. Collecting training labels is a seldom discussed art.


The Yield Curve as a Recession Leading Indicator. An Application for Gradient Boosting and Random Forest

arXiv.org Machine Learning

Most representative decision tree ensemble methods have been used to examine the variable importance of Treasury term spreads to predict US economic recessions with a balance of generating rules for US economic recession detection. A strategy is proposed for training the classifiers with Treasury term spreads data and the results are compared in order to select the best model for interpretability. We also discuss the use of SHapley Additive exPlanations (SHAP) framework to understand US recession forecasts by analyzing feature importance. Consistently with the existing literature we find the most relevant Treasury term spreads for predicting US economic recession and a methodology for detecting relevant rules for economic recession detection. In this case, the most relevant term spread found is 3 month to 6 month, which is proposed to be monitored by economic authorities. Finally, the methodology detected rules with high lift on predicting economic recession that can be used by these entities for this propose. This latter result stands in contrast to a growing body of literature demonstrating that machine learning methods are useful for interpretation comparing many alternative algorithms and we discuss the interpretation for our result and propose further research lines aligned with this work.


More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize

arXiv.org Machine Learning

Of theories for why large-scale machine learning models generalize despite being vastly overparameterized, which of their assumptions are needed to capture the qualitative phenomena of generalization in the real world? On one hand, we find that most theoretical analyses fall short of capturing these qualitative phenomena even for kernel regression, when applied to kernels derived from large-scale neural networks (e.g., ResNet-50) and real data (e.g., CIFAR-100). On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings. To bolster this empirical finding, we prove that the GCV estimator converges to the generalization risk whenever a local random matrix law holds. Finally, we apply this random matrix theory lens to explain why pretrained representations generalize better as well as what factors govern scaling laws for kernel regression. Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice.


Classification from Positive and Biased Negative Data with Skewed Labeled Posterior Probability

arXiv.org Machine Learning

The binary classification problem has a situation where only biased data are observed in one of the classes. In this paper, we propose a new method to approach the positive and biased negative (PbN) classification problem, which is a weakly supervised learning method to learn a binary classifier from positive data and negative data with biased observations. We incorporate a method to correct the negative impact due to skewed confidence, which represents the posterior probability that the observed data are positive. This reduces the distortion of the posterior probability that the data are labeled, which is necessary for the empirical risk minimization of the PbN classification problem. We verified the effectiveness of the proposed method by numerical experiments and real data analysis.


Machine Learning Classification Bootcamp in Python

#artificialintelligence

Apply advanced machine learning models to perform sentiment analysis and classify customer reviews such as Amazon Alexa products reviews Understand the theory and intuition behind several machine learning algorithms such as K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Implement classification algorithms in Scikit-Learn for K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Build an e-mail spam classifier using Naive Bayes classification Technique Apply machine learning models to Healthcare applications such as Cancer and Kyphosis diseases classification Develop Models to predict customer behavior towards targeted Facebook Ads Classify data using K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Build an in-store feature to predict customer's size using their features Develop a fraud detection classifier using Machine Learning Techniques Master Python Seaborn library for statistical plots Understand the difference between Machine Learning, Deep Learning and Artificial Intelligence Perform feature engineering and clean your training and testing data to remove outliers Master Python and Scikit-Learn for Data Science and Machine Learning Learn to use Python Matplotlib library for data Plotting Build an in-store feature to predict customer's size using their features Are you ready to master Machine Learning techniques and Kick-off your career as a Data Scientist?! You came to the right place! Machine Learning skill is one of the top skills to acquire in 2019 with an average salary of over $114,000 in the United States according to PayScale! The total number of ML jobs over the past two years has grown around 600 percent and expected to grow even more by 2020. In this course, we are going to provide students with knowledge of key aspects of state-of-the-art classification techniques.


Filter Drug-induced Liver Injury Literature with Natural Language Processing and Ensemble Learning

arXiv.org Artificial Intelligence

Drug-induced liver injury (DILI) describes the adverse effects of drugs that damage liver. Life-threatening results including liver failure or death were also reported in severe DILI cases. Therefore, DILI-related events are strictly monitored for all approved drugs and the liver toxicity became important assessments for new drug candidates. These DILI-related reports are documented in hospital records, in clinical trial results, and also in research papers that contain preliminary in vitro and in vivo experiments. Conventionally, data extraction from previous publications relies heavily on resource-demanding manual labelling, which considerably decreased the efficiency of the information extraction process. The recent development of artificial intelligence, particularly, the rise of natural language processing (NLP) techniques, enabled the automatic processing of biomedical texts. In this study, based on around 28,000 papers (titles and abstracts) provided by the Critical Assessment of Massive Data Analysis (CAMDA) challenge, we benchmarked model performances on filtering out DILI literature. Among four word vectorization techniques, the model using term frequency-inverse document frequency (TF-IDF) and logistic regression outperformed others with an accuracy of 0.957 with our in-house test set. Furthermore, an ensemble model with similar overall performances was implemented and was fine-tuned to lower the false-negative cases to avoid neglecting potential DILI reports. The ensemble model achieved a high accuracy of 0.954 and an F1 score of 0.955 in the hold-out validation data provided by the CAMDA committee. Moreover, important words in positive/negative predictions were identified via model interpretation. Overall, the ensemble model reached satisfactory classification results, which can be further used by researchers to rapidly filter DILI-related literature.


Wasserstein-based fairness interpretability framework for machine learning models

arXiv.org Artificial Intelligence

The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account the favorability of both the model and predictors with respect to the non-protected class. The quantification is accomplished by the use of transport theory, which gives rise to the decomposition of the model bias and bias explanations to positive and negative contributions. To gain more insight into the role of favorability and allow for additivity of bias explanations, we adapt techniques from cooperative game theory.