Performance Analysis
Queue-based Resampling for Online Class Imbalance Learning
Malialis, Kleanthis, Panayiotou, Christos, Polycarpou, Marios M.
Online class imbalance learning constitutes a new problem and an emerging research topic that focusses on the challenges of online learning under class imbalance and concept drift. Class imbalance deals with data streams that have very skewed distributions while concept drift deals with changes in the class imbalance status. Little work exists that addresses these challenges and in this paper we introduce queue-based resampling, a novel algorithm that successfully addresses the co-existence of class imbalance and concept drift. The central idea of the proposed resampling algorithm is to selectively include in the training set a subset of the examples that appeared in the past. Results on two popular benchmark datasets demonstrate the effectiveness of queue-based resampling over state-of-the-art methods in terms of learning speed and quality.
Counterfactual Fairness in Text Classification through Robustness
Garg, Sahaj, Perot, Vincent, Limtiaco, Nicole, Taly, Ankur, Chi, Ed H., Beutel, Alex
In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute discussed in the example were something else? We offer a heuristic for measuring this particular form of fairness in text classifiers by substituting individual tokens pertaining to attributes (e.g. sexual orientation, race, and religion), and describe the relationship with other notions, including individual and group fairness. Further, we offer methods, including hard ablation, blindness, and counterfactual logit pairing, for optimizing this counterfactual fairness metric during model training, bridging the robustness literature and the fairness literature. Empirically, counterfactual logit pairing performs as well as hard ablation and blindness to sensitive tokens, but generalizes better to unseen tokens. Interestingly, we find that in practice, the methods do not significantly harm classifier performance, and have varying tradeoffs with group fairness. These approaches, both for measurement and optimization, provide a new path forward for addressing counterfactual fairness issues.
Sample Efficient Adaptive Text-to-Speech
Chen, Yutian, Assael, Yannis, Shillingford, Brendan, Budden, David, Reed, Scott, Zen, Heiga, Wang, Quan, Cobo, Luis C., Trask, Andrew, Laurie, Ben, Gulcehre, Caglar, Oord, Aäron van den, Vinyals, Oriol, de Freitas, Nando
We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.
An overview of feature selection strategies
Feature selection and engineering are the most important factors which affect the success of predictive modeling. This remains true even today despite the success of deep learning, which comes with automatic feature engineering. Parsimonious and interpretable models provide simple insights into business problems and therefore they are deemed very valuable. Furthermore, in many occasions the underlying size and structure of the data being analyzed may not allow the use of complex models that have many parameters to tune. For example, in clinical settings where the number of samples is usually much lower than the number of features one could extract (e.g.
Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering
Barghi, Soudabeh, Scaria, Lalet, Salari, Ali, Glatard, Tristan
Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss.
A Novel Online Stacked Ensemble for Multi-Label Stream Classification
Büyükçakır, Alican, Bonab, Hamed, Can, Fazli
As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is classified into one or more pre-defined sets of labels. Many methods have been proposed to tackle this problem, including but not limited to ensemble-based methods. Some of these ensemble-based methods are specifically designed to work with certain multi-label base classifiers; some others employ online bagging schemes to build their ensembles. In this study, we introduce a novel online and dynamically-weighted stacked ensemble for multi-label classification, called GOOWE-ML, that utilizes spatial modeling to assign optimal weights to its component classifiers. Our model can be used with any existing incremental multi-label classification algorithm as its base classifier. We conduct experiments with 4 GOOWE-ML-based multi-label ensembles and 7 baseline models on 7 real-world datasets from diverse areas of interest. Our experiments show that GOOWE-ML ensembles yield consistently better results in terms of predictive performance in almost all of the datasets, with respect to the other prominent ensemble models.
Isolating effects of age with fair representation learning when assessing dementia
Zhu, Zining, Novikova, Jekaterina, Rudzicz, Frank
One of the most prevalent symptoms among the elderly population, dementia, can be detected by classifiers trained on linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by the normal aging process. Aging is therefore a confounding factor, whose effects have been hard for machine learning classifiers to isolate. In this paper, we show that deep neural network (DNN) classifiers can infer ages from linguistic features, which is an entanglement that could lead to unfairness across age groups. We show this problem is caused by undesired activations of v-structures in causality diagrams, and it could be addressed with fair representation learning. We build neural network classifiers that learn low-dimensional representations reflecting the impacts of dementia yet discarding the effects of age. To evaluate these classifiers, we specify a model-agnostic score $\Delta_{eo}^{(N)}$ measuring how classifier results are disentangled from age. Our best models outperform baseline neural network classifiers in disentanglement, while compromising accuracy by as little as 2.56\% and 2.25\% on DementiaBank and the Famous People dataset respectively.
Artificial Intelligence Can Reinforce Bias, Cloud Giants Announce Tools For AI Fairness
Unfairly trained Artificial Intelligence (AI) systems can reinforce bias, therefore AI systems must be trained fairly. Experts say AI fairness is a dataset issue for each specific machine learning model. AI fairness is a newly recognized challenge. The big cloud providers are in the process of developing and announcing tools to help address AI fairness. Facebook announced internal software tools development to search for bias in training datasets in May 2018.
Artificial Intelligence Can Reinforce Bias, Cloud Giants Announce Tools For AI Fairness
Unfairly trained Artificial Intelligence (AI) systems can reinforce bias, therefore AI systems must be trained fairly. Experts say AI fairness is a dataset issue for each specific machine learning model. AI fairness is a newly recognized challenge. The big cloud providers are in the process of developing and announcing tools to help address AI fairness. Facebook announced internal software tools development to search for bias in training datasets in May 2018.
Efficient Seismic fragility curve estimation by Active Learning on Support Vector Machines
Sainct, Rémi, Feau, Cyril, Martinez, Jean-Marc, Garnier, Josselin
Fragility curves which express the failure probability of a structure, or critical components, as function of a loading intensity measure are nowadays widely used (i) in Seismic Probabilistic Risk Assessment studies, (ii) to evaluate impact of construction details on the structural performance of installations under seismic excitations or under other loading sources such as wind. To avoid the use of parametric models such as lognormal model to estimate fragility curves from a reduced number of numerical calculations, a methodology based on Support Vector Machines coupled with an active learning algorithm is proposed in this paper. In practice, input excitation is reduced to some relevant parameters and, given these parameters, SVMs are used for a binary classification of the structural responses relative to a limit threshold of exceedance. Since the output is not only binary, this is a score, a probabilistic interpretation of the output is exploited to estimate very efficiently fragility curves as score functions or as functions of classical seismic intensity measures.