Goto

Collaborating Authors

 Accuracy


Training OOD Detectors in their Natural Habitats

arXiv.org Artificial Intelligence

Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild. Recent methods use auxiliary outlier data to regularize the model for improved OOD detection. However, these approaches make a strong distributional assumption that the auxiliary outlier data is completely separable from the in-distribution (ID) data. In this paper, we propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples. Such wild data is abundant and arises freely upon deploying a machine learning classifier in their \emph{natural habitats}. Our key idea is to formulate a constrained optimization problem and to show how to tractably solve it. Our learning objective maximizes the OOD detection rate, subject to constraints on the classification error of ID data and on the OOD error rate of ID examples. We extensively evaluate our approach on common OOD detection tasks and demonstrate superior performance.


Evaluation of Runtime Monitoring for UAV Emergency Landing

arXiv.org Artificial Intelligence

To certify UAV operations in populated areas, risk mitigation strategies -- such as Emergency Landing (EL) -- must be in place to account for potential failures. EL aims at reducing ground risk by finding safe landing areas using on-board sensors. The first contribution of this paper is to present a new EL approach, in line with safety requirements introduced in recent research. In particular, the proposed EL pipeline includes mechanisms to monitor learning based components during execution. This way, another contribution is to study the behavior of Machine Learning Runtime Monitoring (MLRM) approaches within the context of a real-world critical system. A new evaluation methodology is introduced, and applied to assess the practical safety benefits of three MLRM mechanisms. The proposed approach is compared to a default mitigation strategy (open a parachute when a failure is detected), and appears to be much safer.


Mental Stress Detection using Data from Wearable and Non-wearable Sensors: A Review

arXiv.org Artificial Intelligence

This paper presents a comprehensive review of methods covering significant subjective and objective human stress detection techniques available in the literature. The methods for measuring human stress responses could include subjective questionnaires (developed by psychologists) and objective markers observed using data from wearable and non-wearable sensors. In particular, wearable sensor-based methods commonly use data from electroencephalography, electrocardiogram, galvanic skin response, electromyography, electrodermal activity, heart rate, heart rate variability, and photoplethysmography both individually and in multimodal fusion strategies. Whereas, methods based on non-wearable sensors include strategies such as analyzing pupil dilation and speech, smartphone data, eye movement, body posture, and thermal imaging. Whenever a stressful situation is encountered by an individual, physiological, physical, or behavioral changes are induced which help in coping with the challenge at hand. A wide range of studies has attempted to establish a relationship between these stressful situations and the response of human beings by using different kinds of psychological, physiological, physical, and behavioral measures. Inspired by the lack of availability of a definitive verdict about the relationship of human stress with these different kinds of markers, a detailed survey about human stress detection methods is conducted in this paper. In particular, we explore how stress detection methods can benefit from artificial intelligence utilizing relevant data from various sources. This review will prove to be a reference document that would provide guidelines for future research enabling effective detection of human stress conditions.


Introducing explainable supervised machine learning into interactive feedback loops for statistical production system

arXiv.org Machine Learning

Statistical production systems cover multiple steps from the collection, aggregation, and integration of data to tasks like data quality assurance and dissemination. While the context of data quality assurance is one of the most promising fields for applying machine learning, the lack of curated and labeled training data is often a limiting factor. The statistical production system for the Centralised Securities Database features an interactive feedback loop between data collected by the European Central Bank and data quality assurance performed by data quality managers at National Central Banks. The quality assurance feedback loop is based on a set of rule-based checks for raising exceptions, upon which the user either confirms the data or corrects an actual error. In this paper we use the information received from this feedback loop to optimize the exceptions presented to the National Central Banks thereby improving the quality of exceptions generated and the time consumed on the system by the users authenticating those exceptions. For this approach we make use of explainable supervised machine learning to (a) identify the types of exceptions and (b) to prioritize which exceptions are more likely to require an intervention or correction by the NCBs. Furthermore, we provide an explainable AI taxonomy aiming to identify the different explainable AI needs that arose during the project.


Evaluation Methods and Measures for Causal Learning Algorithms

arXiv.org Artificial Intelligence

The convenient access to copious multi-faceted data has encouraged machine learning researchers to reconsider correlation-based learning and embrace the opportunity of causality-based learning, i.e., causal machine learning (causal learning). Recent years have therefore witnessed great effort in developing causal learning algorithms aiming to help AI achieve human-level intelligence. Due to the lack-of ground-truth data, one of the biggest challenges in current causal learning research is algorithm evaluations. This largely impedes the cross-pollination of AI and causal inference, and hinders the two fields to benefit from the advances of the other. To bridge from conventional causal inference (i.e., based on statistical methods) to causal learning with big data (i.e., the intersection of causal inference and machine learning), in this survey, we review commonly-used datasets, evaluation methods, and measures for causal learning using an evaluation pipeline similar to conventional machine learning. We focus on the two fundamental causal-inference tasks and causality-aware machine learning tasks. Limitations of current evaluation procedures are also discussed. We then examine popular causal inference tools/packages and conclude with primary challenges and opportunities for benchmarking causal learning algorithms in the era of big data. The survey seeks to bring to the forefront the urgency of developing publicly available benchmarks and consensus-building standards for causal learning evaluation with observational data. In doing so, we hope to broaden the discussions and facilitate collaboration to advance the innovation and application of causal learning.


Everything You Need to Know to Build an Amazing Binary Classifier

#artificialintelligence

There are two general types of supervised machine learning approaches in their simplest form. First, you can have a regression problem, where you're trying to predict a continuous variable, such as the temperature or a stock price. The second is a classification problem where you want to predict a categorical variable such as pass/fail or spam/ham. Additionally, we can have binary classification problems that we'll cover here with only two outcomes and multi-class classification with more than two outcomes. We want to take several steps to prepare our data for Machine Learning.


Backpropagation Neural Tree

arXiv.org Artificial Intelligence

We propose a novel algorithm called Backpropagation Neural Tree (BNeuralT), which is a stochastic computational dendritic tree. BNeuralT takes random repeated inputs through its leaves and imposes dendritic nonlinearities through its internal connections like a biological dendritic tree would do. Considering the dendritic-tree like plausible biological properties, BNeuralT is a single neuron neural tree model with its internal sub-trees resembling dendritic nonlinearities. BNeuralT algorithm produces an ad hoc neural tree which is trained using a stochastic gradient descent optimizer like gradient descent (GD), momentum GD, Nesterov accelerated GD, Adagrad, RMSprop, or Adam. BNeuralT training has two phases, each computed in a depth-first search manner: the forward pass computes neural tree's output in a post-order traversal, while the error backpropagation during the backward pass is performed recursively in a pre-order traversal. A BNeuralT model can be considered a minimal subset of a neural network (NN), meaning it is a "thinned" NN whose complexity is lower than an ordinary NN. Our algorithm produces high-performing and parsimonious models balancing the complexity with descriptive ability on a wide variety of machine learning problems: classification, regression, and pattern recognition.


Stop Oversampling for Class Imbalance Learning: A Critical Review

arXiv.org Artificial Intelligence

For the last two decades, oversampling has been employed to overcome the challenge of learning from imbalanced datasets. Many approaches to solving this challenge have been offered in the literature. Oversampling, on the other hand, is a concern. That is, models trained on fictitious data may fail spectacularly when put to real-world problems. The fundamental difficulty with oversampling approaches is that, given a real-life population, the synthesized samples may not truly belong to the minority class. As a result, training a classifier on these samples while pretending they represent minority may result in incorrect predictions when the model is used in the real world. We analyzed a large number of oversampling methods in this paper and devised a new oversampling evaluation system based on hiding a number of majority examples and comparing them to those generated by the oversampling process. Based on our evaluation system, we ranked all these methods based on their incorrectly generated examples for comparison. Our experiments using more than 70 oversampling methods and three imbalanced real-world datasets reveal that all oversampling methods studied generate minority samples that are most likely to be majority. Given data and methods in hand, we argue that oversampling in its current forms and methodologies is unreliable for learning from class imbalanced data and should be avoided in real-world applications.


MarkovGNN: Graph Neural Networks on Markov Diffusion

arXiv.org Artificial Intelligence

Most real-world networks contain well-defined community structures where nodes are densely connected internally within communities. To learn from these networks, we develop MarkovGNN that captures the formation and evolution of communities directly in different convolutional layers. Unlike most Graph Neural Networks (GNNs) that consider a static graph at every layer, MarkovGNN generates different stochastic matrices using a Markov process and then uses these community-capturing matrices in different layers. MarkovGNN is a general approach that could be used with most existing GNNs. We experimentally show that MarkovGNN outperforms other GNNs for clustering, node classification, and visualization tasks. The source code of MarkovGNN is publicly available at \url{https://github.com/HipGraph/MarkovGNN}.


The impact of feature importance methods on the interpretation of defect classifiers

arXiv.org Artificial Intelligence

Abstract--Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods. We note, however, that a CS method is not always readily available for Defect classifiers are widely used by many large software corporations a given classifier. Defect classifiers are commonly and deep neural networks do not have a widely accepted CS interpreted to uncover insights to improve software quality. Therefore it is the feature importance ranks of different classifiers is pivotal that these generated insights are reliable. Such CA methods measure the contribution of each feature a feature importance method to compute a ranking of feature towards a classifier's predictions. These measure the contribution of each feature by effecting changes to feature importance ranks reflect the order in which the studied that particular feature in the dataset and observing its impact on features contribute to the predictive capability of the studied the outcome. The primary advantage of CA methods is that they classifier [14].