Goto

Collaborating Authors

 Accuracy


Explainable Black-Box Attacks Against Model-based Authentication

arXiv.org Artificial Intelligence

Establishing unique identities for both humans and end systems has been an active research problem in the security community, giving rise to innovative machine learning-based authentication techniques. Although such techniques offer an automated method to establish identity, they have not been vetted against sophisticated attacks that target their core machine learning technique. This paper demonstrates that mimicking the unique signatures generated by host fingerprinting and biometric authentication systems is possible. We expose the ineffectiveness of underlying machine learning classification models by constructing a blind attack based around the query synthesis framework and utilizing Explainable-AI (XAI) techniques. We launch an attack in under 130 queries on a state-of-the-art face authentication system, and under 100 queries on a host authentication system. We examine how these attacks can be defended against and explore their limitations. XAI provides an effective means for adversaries to infer decision boundaries and provides a new way forward in constructing attacks against systems using machine learning models for authentication.


Comment on "DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification"

Science

Even if the 8-oxo-G artifacts are twice the context-specific background level (i.e., GIVG_T 2), this corresponds to only a 5 to 10% increase in the overall base-level error rate (summed over all sequence contexts; Figure 1, A to C), which is less than the intersample variability of error rates at a fixed oxoQ. A 5% increase in the base-level error rate results in a minor, if any, increase in false-positive mutation calls (Figure 1, E and F), because calling algorithms are designed to handle typical levels of sequencing error. Only at GIVG_T 5 (equivalent to oxoQ 35) do the additional errors from 8-oxo-G become comparable to the sum of all other errors and have an adverse impact on variant calling. The vast majority of samples in TCGA exhibit only minor 8-oxo-G damage that has minimal impact on mutation calling. Consequently, the claim that 73% of TCGA sequencing runs have extensive damage is misleading.


Unfolding Naive Bayes From Scratch

#artificialintelligence

I have tried to keep things simple and in plain-English. The sole purpose is to deeply and clearly understand the working of a well know Text Classification ML Algorithm (Naรฏve Bayes) without being trapped in the gibberish mathematical jargon that is often used in the explanation of ML Algorithms which obviously lands you nowhere except for being relying on ML API's with almost zero understanding of how the things actually work! A complete clear picture of the Naรฏve Bayes ML Algorithm with all its mysterious mathematics demystified plus a concrete step taken forward in your ML voyage! The Grand Grand Grand Milestone # 3: The Testing Phase --Where Prediction Comes into the Play! Naive Bayes is one of the most common ML algorithms that is often used for the purpose of text classification.


Queue-based Resampling for Online Class Imbalance Learning

arXiv.org Machine Learning

Online class imbalance learning constitutes a new problem and an emerging research topic that focusses on the challenges of online learning under class imbalance and concept drift. Class imbalance deals with data streams that have very skewed distributions while concept drift deals with changes in the class imbalance status. Little work exists that addresses these challenges and in this paper we introduce queue-based resampling, a novel algorithm that successfully addresses the co-existence of class imbalance and concept drift. The central idea of the proposed resampling algorithm is to selectively include in the training set a subset of the examples that appeared in the past. Results on two popular benchmark datasets demonstrate the effectiveness of queue-based resampling over state-of-the-art methods in terms of learning speed and quality.


Counterfactual Fairness in Text Classification through Robustness

arXiv.org Machine Learning

In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute discussed in the example were something else? We offer a heuristic for measuring this particular form of fairness in text classifiers by substituting individual tokens pertaining to attributes (e.g. sexual orientation, race, and religion), and describe the relationship with other notions, including individual and group fairness. Further, we offer methods, including hard ablation, blindness, and counterfactual logit pairing, for optimizing this counterfactual fairness metric during model training, bridging the robustness literature and the fairness literature. Empirically, counterfactual logit pairing performs as well as hard ablation and blindness to sensitive tokens, but generalizes better to unseen tokens. Interestingly, we find that in practice, the methods do not significantly harm classifier performance, and have varying tradeoffs with group fairness. These approaches, both for measurement and optimization, provide a new path forward for addressing counterfactual fairness issues.


Sample Efficient Adaptive Text-to-Speech

arXiv.org Machine Learning

We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.


Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

arXiv.org Machine Learning

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss.


A Novel Online Stacked Ensemble for Multi-Label Stream Classification

arXiv.org Machine Learning

As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is classified into one or more pre-defined sets of labels. Many methods have been proposed to tackle this problem, including but not limited to ensemble-based methods. Some of these ensemble-based methods are specifically designed to work with certain multi-label base classifiers; some others employ online bagging schemes to build their ensembles. In this study, we introduce a novel online and dynamically-weighted stacked ensemble for multi-label classification, called GOOWE-ML, that utilizes spatial modeling to assign optimal weights to its component classifiers. Our model can be used with any existing incremental multi-label classification algorithm as its base classifier. We conduct experiments with 4 GOOWE-ML-based multi-label ensembles and 7 baseline models on 7 real-world datasets from diverse areas of interest. Our experiments show that GOOWE-ML ensembles yield consistently better results in terms of predictive performance in almost all of the datasets, with respect to the other prominent ensemble models.


Isolating effects of age with fair representation learning when assessing dementia

arXiv.org Machine Learning

One of the most prevalent symptoms among the elderly population, dementia, can be detected by classifiers trained on linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by the normal aging process. Aging is therefore a confounding factor, whose effects have been hard for machine learning classifiers to isolate. In this paper, we show that deep neural network (DNN) classifiers can infer ages from linguistic features, which is an entanglement that could lead to unfairness across age groups. We show this problem is caused by undesired activations of v-structures in causality diagrams, and it could be addressed with fair representation learning. We build neural network classifiers that learn low-dimensional representations reflecting the impacts of dementia yet discarding the effects of age. To evaluate these classifiers, we specify a model-agnostic score $\Delta_{eo}^{(N)}$ measuring how classifier results are disentangled from age. Our best models outperform baseline neural network classifiers in disentanglement, while compromising accuracy by as little as 2.56\% and 2.25\% on DementiaBank and the Famous People dataset respectively.


Artificial Intelligence Can Reinforce Bias, Cloud Giants Announce Tools For AI Fairness

#artificialintelligence

Unfairly trained Artificial Intelligence (AI) systems can reinforce bias, therefore AI systems must be trained fairly. Experts say AI fairness is a dataset issue for each specific machine learning model. AI fairness is a newly recognized challenge. The big cloud providers are in the process of developing and announcing tools to help address AI fairness. Facebook announced internal software tools development to search for bias in training datasets in May 2018.