Accuracy
Fooling the classifier: Ligand antagonism and adversarial examples
Rademaker, Thomas J., Bengio, Emmanuel, François, Paul
Machine learning algorithms are sensitive to so-called adversarial perturbations. This is reminiscent of cellular decision-making where antagonist ligands may prevent correct signaling, like during the early immune response. We draw a formal analogy between neural networks used in machine learning and the general class of adaptive proofreading networks. We then apply simple adversarial strategies from machine learning to models of ligand discrimination. We show how kinetic proofreading leads to "boundary tilting" and identify three types of perturbation (adversarial, non adversarial and ambiguous). We then use a gradient-descent approach to compare different adaptive proofreading models, and we reveal the existence of two qualitatively different regimes characterized by the presence or absence of a critical point. These regimes are reminiscent of the "feature-to-prototype" transition identified in machine learning, corresponding to two strategies in ligand antagonism (broad vs. specialized). Overall, our work connects evolved cellular decision-making to classification in machine learning, showing that behaviours close to the decision boundary can be understood through the same mechanisms.
Scalable Sparse Subspace Clustering via Ordered Weighted $\ell_1$ Regression
The main contribution of the paper is a new approach to subspace clustering that is significantly more computationally efficient and scalable than existing state-of-the-art methods. The central idea is to modify the regression technique in sparse subspace clustering (SSC) by replacing the $\ell_1$ minimization with a generalization called Ordered Weighted $\ell_1$ (OWL) minimization which performs simultaneous regression and clustering of correlated variables. Using random geometric graph theory, we prove that OWL regression selects more points within each subspace, resulting in better clustering results. This allows for accurate subspace clustering based on regression solutions for only a small subset of the total dataset, significantly reducing the computational complexity compared to SSC. In experiments, we find that our OWL approach can achieve a speedup of 20$\times$ to 30$\times$ for synthetic problems and 4$\times$ to 8$\times$ on real data problems.
Constraint-based Causal Discovery for Non-Linear Structural Causal Models with Cycles and Latent Confounders
Forré, Patrick, Mooij, Joris M.
We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and non-linearities. We introduce {\sigma}-connection graphs ({\sigma}-CG), a new class of mixed graphs (containing undirected, bidirected and directed edges) with additional structure, and extend the concept of {\sigma}-separation, the appropriate generalization of the well-known notion of d-separation in this setting, to apply to {\sigma}-CGs. We prove the closedness of {\sigma}-separation under marginalisation and conditioning and exploit this to implement a test of {\sigma}-separation on a {\sigma}-CG. This then leads us to the first causal discovery algorithm that can handle non-linear functional relations, latent confounders, cyclic causal relationships, and data from different (stochastic) perfect interventions. As a proof of concept, we show on synthetic data how well the algorithm recovers features of the causal graph of modular structural causal models.
London police chief 'completely comfortable' using facial recognition
The facial recognition software used by London's Metropolitan Police is producing a 98 per cent false positive rate. This means the system is regularly getting it wrong, with only two out of every 100 'matches' pinpointing the correct person. Despite this success rate, London's police commissioner Cressida Dick says she is'completely comfortable' to continue using the system in the capital. So far, the programme has resulted in only two accurate identifications – and neither pinpointed a wanted criminal. One was an individual from an out-of-date watch list, while the other was a person with a mental condition who regularly contacts public figures, but is not wanted for arrest.
Decision time: Sometimes accuracy is not your friend
Machine learning is about machines making decisions and, as we have already discussed, we can produce multiple models for any given problem and measure their accuracy. It is intuitively obvious that we would elect to use the most accurate model and most of the time, of course, we do. But there are times when we will actually elect to use one of the less accurate ones. The underlying reason is that the estimates we make of accuracy, whilst very useful, take no account of the cost of being right and being wrong. We might be trying to identify which of our customers on a clothing website are women and which men so that our recommendation engine makes the appropriate clothing suggestions.
Oracle-free Detection of Translation Issue for Neural Machine Translation
Zheng, Wujie, Wang, Wenyu, Liu, Dian, Zhang, Changrong, Zeng, Qinsong, Deng, Yuetang, Yang, Wei, Xie, Tao
Neural Machine Translation (NMT) has been widely adopted over recent years due to its advantages on various translation tasks. However, NMT systems can be error-prone due to the intractability of natural languages and the design of neural networks, bringing issues to their translations. These issues could potentially lead to information loss, wrong semantics, and low readability in translations, compromising the usefulness of NMT and leading to potential non-trivial consequences. Although there are existing approaches, such as using the BLEU score, on quality assessment and issue detection for NMT, such approaches face two serious limitations. First, such solutions require oracle translations, i.e., reference translations, which are often unavailable, e.g., in production environments. Second, such approaches cannot pinpoint the issue types and locations within translations. To address such limitations, we propose a new approach aiming to precisely detect issues in translations without requiring oracle translations. Our approach focuses on two most prominent issues in NMT translations by including two detection algorithms. Our experimental results show that our new approach could achieve high effectiveness on real-world datasets. Our successful experience on deploying the proposed algorithms in both the development and production environments of WeChat, a messenger app with over one billion of monthly active users, helps eliminate numerous defects of our NMT model, monitor the effectiveness on real-world translation tasks, and collect in-house test cases, producing high industry impact.
How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
Colas, Cédric, Sigaud, Olivier, Oudeyer, Pierre-Yves
Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the so-called "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how the number of random seeds relates to the probabilities of statistical errors. For both the t-test and the bootstrap confidence interval test, we recall theoretical guidelines to determine the number of random seeds one should use to provide a statistically significant comparison of the performance of two algorithms. Finally, we discuss the influence of deviations from the assumptions usually made by statistical tests. We show that they can lead to inaccurate evaluations of statistical errors and provide guidelines to counter these negative effects. We make our code available to perform the tests.
Ensemble learning with Conformal Predictors: Targeting credible predictions of conversion from Mild Cognitive Impairment to Alzheimer's Disease
Pereira, Telma, Cardoso, Sandra, Silva, Dina, Guerreiro, Manuela, de Mendonça, Alexandre, Madeira, Sara C.
Most machine learning classifiers give predictions for new examples accurately, yet without indicating how trustworthy predictions are. In the medical domain, this hampers their integration in decision support systems, which could be useful in the clinical practice. We use a supervised learning approach that combines Ensemble learning with Conformal Predictors to predict conversion from Mild Cognitive Impairment to Alzheimer's Disease. Our goal is to enhance the classification performance (Ensemble learning) and complement each prediction with a measure of credibility (Conformal Predictors). Our results showed the superiority of the proposed approach over a similar ensemble framework with standard classifiers.
Counterfactual Evaluation of Machine Learning Models
So I'm sure many of you know Stripe. It's a company that provides a platform for e-commerce. And one of the things that everyone encounters when conducting commerce online is, unsurprisingly, fraud. So before I get into the details of how we address fraud with machine learning, I want to talk a little bit about the fraud life cycle. So what typically happens in fraud is that you have an organized crime ring install malware on point-of-sale devices. For example, there was this famous breach at Target about five years ago. So you can actually go online, if you go to the deep web and buy credit card numbers that were taken from personal devices, ATM machines and so forth. What's kind of surprising and funny is that these criminals who are selling credit card numbers to smaller time criminals are quite customer service oriented. So you can say, "I want 12 credit card numbers from Wells Fargo or Citibank. I want credit card numbers that were issued in the zip codes in 94102 to 94105 and so forth." Some of them are in fact so customer serviced oriented that they guarantee you that if you are unable to commit fraud with the cards you buy, they'll give you your money back. Let's say, five years at Stripe was enough for me. I decided to leave and become a criminal, using all my knowledge.
Learning under selective labels in the presence of expert consistency
De-Arteaga, Maria, Dubrawski, Artur, Chouldechova, Alexandra
We explore the problem of learning under selective labels in the context of algorithm-assisted decision making. Selective labels is a pervasive selection bias problem that arises when historical decision making blinds us to the true outcome for certain instances. Examples of this are common in many applications, ranging from predicting recidivism using pre-trial release data to diagnosing patients. In this paper we discuss why selective labels often cannot be effectively tackled by standard methods for adjusting for sample selection bias, even if there are no unobservables. We propose a data augmentation approach that can be used to either leverage expert consistency to mitigate the partial blindness that results from selective labels, or to empirically validate whether learning under such framework may lead to unreliable models prone to systemic discrimination.