Performance Analysis
Text Classification of the Precursory Accelerating Seismicity Corpus: Inference on some Theoretical Trends in Earthquake Predictability Research from 1988 to 2018
Text analytics based on supervised machine learning classifiers has shown great promise in a multitude of domains, but has yet to be applied to Seismology. We test various standard models (Naive Bayes, k-Nearest Neighbors, Support Vector Machines, and Random Forests) on a seismological corpus of 100 articles related to the topic of precursory accelerating seismicity, spanning from 1988 to 2010. This corpus was labelled in Mignan (2011) with the precursor whether explained by critical processes (i.e., cascade triggering) or by other processes (such as signature of main fault loading). We investigate rather the classification process can be automatized to help analyze larger corpora in order to better understand trends in earthquake predictability research. We find that the Naive Bayes model performs best, in agreement with the machine learning literature for the case of small datasets, with cross-validation accuracies of 86% for binary classification. For a refined multiclass classification ('non-critical process' < 'agnostic' < 'critical process assumed' < 'critical process demonstrated'), we obtain up to 78% accuracy. Prediction on a dozen of articles published since 2011 shows however a weak generalization with a F1-score of 60%, only slightly better than a random classifier, which can be explained by a change of authorship and use of different terminologies. Yet, the model shows F1-scores greater than 80% for the two multiclass extremes ('non-critical process' versus 'critical process demonstrated') while it falls to random classifier results (around 25%) for papers labelled 'agnostic' or 'critical process assumed'. Those results are encouraging in view of the small size of the corpus and of the high degree of abstraction of the labelling. Domain knowledge engineering remains essential but can be made transparent by an investigation of Naive Bayes keyword posterior probabilities.
Reinforcement Learning with Perturbed Rewards
Wang, Jingkang, Liu, Yang, Li, Bo
Recent studies have shown the vulnerability of reinforcement learning (RL) models in noisy settings. The sources of noises differ across scenarios. For instance, in practice, the observed reward channel is often subject to noise (e.g., when observed rewards are collected through sensors), and thus observed rewards may not be credible as a result. Also, in applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors. In this paper, we consider noisy RL problems where observed rewards by RL agents are generated with a reward confusion matrix. We call such observed rewards as perturbed rewards. We develop an unbiased reward estimator aided robust RL framework that enables RL agents to learn in noisy environments while observing only perturbed rewards. Our framework draws upon approaches for supervised learning with noisy data. The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 67.5% and 46.7% improvements in average on five Atari games, when the error rates are 10% and 30% respectively.
Graphical Lasso and Thresholding: Equivalence and Closed-form Solutions
Fattahi, Salar, Sojoudi, Somayeh
Graphical Lasso (GL) is a popular method for learning the structure of an undirected graphical model, which is based on an $l_1$ regularization technique. The objective of this paper is to compare the computationally-heavy GL technique with a numerically-cheap heuristic method that is based on simply thresholding the sample covariance matrix. To this end, two notions of sign-consistent and inverse-consistent matrices are developed, and then it is shown that the thresholding and GL methods are equivalent if: (i) the thresholded sample covariance matrix is both sign-consistent and inverse-consistent, and (ii) the gap between the largest thresholded and the smallest un-thresholded entries of the sample covariance matrix is not too small. By building upon this result, it is proved that the GL method--as a conic optimization problem--has an explicit closed-form solution if the thresholded sample covariance matrix has an acyclic structure. This result is then generalized to arbitrary sparse support graphs, where a formula is found to obtain an approximate solution of GL. Furthermore, it is shown that the approximation error of the derived explicit formula decreases exponentially fast with respect to the length of the minimum-length cycle of the sparsity graph. The developed results are demonstrated on synthetic data, functional MRI data, traffic flows for transportation networks, and massive randomly generated datasets. We show that the proposed method can obtain an accurate approximation of the GL for instances with the sizes as large as $80,000\times 80,000$ (more than 3.2 billion variables) in less than 30 minutes on a standard laptop computer running MATLAB, while other state-of-the-art methods do not converge within 4 hours.
Model Cards for Model Reporting
Mitchell, Margaret, Wu, Simone, Zaldivar, Andrew, Barnes, Parker, Vasserman, Lucy, Hutchinson, Ben, Spitzer, Elena, Raji, Inioluwa Deborah, Gebru, Timnit
Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for two supervised models: One trained to detect smiling faces in images, and one trained to detect toxic comments in text. We propose model cards as a step towards the responsible democratization of machine learning and related AI technology, increasing transparency into how well AI technology works. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation.
Adapt DevOps to cognitive and artificial intelligence systems - IBM Developer
These new applications require a new way of thinking about the development process. Traditional application development has been enhanced by the idea of DevOps, which forces operational considerations into development time, execution, and process. In this tutorial, we outline a "cognitive DevOps" process that refines and adapts the best parts of DevOps for new cognitive applications. Specifically, we cover applying DevOps to the training process of cognitive systems including training data, modeling, and performance evaluation. A cognitive or artificial intelligence (AI) system fundamentally exhibits capabilities such as understanding, reasoning, and learning from data. At a deeper level, the system is built upon a combination of various types of cognitive tasks, which, when combined, make up a part of the overall cognitive application. The science upon which a cognitive system is built includes, but is not limited to, machine learning (ML) including deep learning and natural language processing.
Projective Inference in High-dimensional Problems: Prediction and Feature Selection
Piironen, Juho, Paasiniemi, Markus, Vehtari, Aki
This paper discusses predictive inference and feature selection for generalized linear models with scarce but high-dimensional data. We argue that in many cases one can benefit from a decision theoretically justified two-stage approach: first, construct a possibly non-sparse model that predicts well, and then find a minimal subset of features that characterize the predictions. The model built in the first step is referred to as the \emph{reference model} and the operation during the latter step as predictive \emph{projection}. The key characteristic of this approach is that it finds an excellent tradeoff between sparsity and predictive accuracy, and the gain comes from utilizing all available information including prior and that coming from the left out features. We review several methods that follow this principle and provide novel methodological contributions. We present a new projection technique that unifies two existing techniques and is both accurate and fast to compute. We also propose a way of evaluating the feature selection process using fast leave-one-out cross-validation that allows for easy and intuitive model size selection. Furthermore, we prove a theorem that helps to understand the conditions under which the projective approach could be beneficial. The benefits are illustrated via several simulated and real world examples.
Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability
Okuno, Akifumi, Kim, Geewook, Shimodaira, Hidetoshi
We propose shifted inner-product similarity (SIPS), which is a novel yet very simple extension of the ordinary inner-product similarity (IPS) for neural-network based graph embedding (GE). In contrast to IPS, that is limited to approximating positive-definite (PD) similarities, SIPS goes beyond the limitation by introducing bias terms in IPS; we theoretically prove that SIPS is capable of approximating not only PD but also conditionally PD (CPD) similarities with many examples such as cosine similarity, negative Poincare distance and negative Wasserstein distance. Since SIPS with sufficiently large neural networks learns a variety of similarities, SIPS alleviates the need for configuring the similarity function of GE. Approximation error rate is also evaluated, and experiments on two real-world datasets demonstrate that graph embedding using SIPS indeed outperforms existing methods.
Conor McGregor vs Khabib: How accidentally streaming PPV fight on Periscope led to a huge fine
A man who broadcast part of a pay-per-view fight on his phone has received thousands of pounds worth of fines, despite not actually broadcasting a single minute of the fight itself. The case has prompted industry warnings about the free live streams set to spread across Facebook and Twitter ahead of Saturday's fight between Conor McGregor and Khabib Nurmagomedov at UFC 229. Josh Mellor said he had no idea that he was breaking the law when streaming several minutes of the pre-fight coverage of a boxing match using Periscope on his smartphone. "I went round to my friend's house to watch a pay-per-view boxing match and while we were waiting for the fight to start I started scrolling through Periscope," Mr Mellor told The Independent. "I'd heard in the pub, and from friends, that you could watch free live streams of the fight and wondered how as we'd paid to watch it. Whilst on Periscope, I saw a number of streams and while exploring I clicked the'Go Live' button. I streamed the pre-fight coverage from my mate's TV for a few minutes before quitting the app."
CRED: A Deep Residual Network of Convolutional and Recurrent Units for Earthquake Signal Detection
Mousavi, S. Mostafa, Zhu, Weiqiang, Sheng, Yixiao, Beroza, Gregory C.
Each year, more than 50 terabytes of seismic data are archived at the Incorporated Research Institutions for Seismology (IRIS) alone. The massive amount of data highlights the need for more efficient and powerful tools for data processing and analyses. The main challenge is the efficient extraction of as much useful information as possible from these large datasets. This is where rapidly evolving machine learning (ML) approaches have the potential to play a key role (Zhu and Beroza 2018; Li et al, 2018; Ross et al, 2018b; Chen 2018). 1 One of the first stages that observational seismologists need to meet this challenge is in the processing of continuous data to detect earthquake signals. Among a large variety of detection methods developed in past few decades, STA/LTA (Allen, 1978) and template matching (Gibbons and Ringdal 2006; Shelly et al. 2007; Ross et al, 2017; Li et al, 2018) are the most commonly used algorithms. While STA/LTA is generalized and efficient, its sensitivity to timevarying background noise and lack of sensitivity to small events, false positives, and events recorded shortly after each other make it less than optimal for robust and sensitive detection. Although the high sensitivity of cross-correlation improves the detection threshold of template matching, the requirement of prior knowledge of templates and multiple cross-correlation procedures make it less general and inefficient for real-time processing of large seismic data volumes. Although more advanced algorithms such as Fingerprint And Similarity Thresholding (FAST) (Yoon et al, 2015) can improve the efficiency of the similarity search, the outputs are in that case limited to repeated events. Shallow Neural Networks (NN) are among the first ML methods used for the earthquake signal detection (e.g.
From Soft Classifiers to Hard Decisions: How fair can we be?
Canetti, Ran, Cohen, Aloni, Dikkala, Nishanth, Ramnarayan, Govind, Scheffler, Sarah, Smith, Adam
A popular methodology for building binary decision-making classifiers in the presence of imperfect information is to first construct a non-binary "scoring" classifier that is calibrated over all protected groups, and then to post-process this score to obtain a binary decision. We study the feasibility of achieving various fairness properties by post-processing calibrated scores, and then show that deferring post-processors allow for more fairness conditions to hold on the final decision. Specifically, we show: 1. There does not exist a general way to post-process a calibrated classifier to equalize protected groups' positive or negative predictive value (PPV or NPV). For certain "nice" calibrated classifiers, either PPV or NPV can be equalized when the post-processor uses different thresholds across protected groups, though there exist distributions of calibrated scores for which the two measures cannot be both equalized. When the post-processing consists of a single global threshold across all groups, natural fairness properties, such as equalizing PPV in a nontrivial way, do not hold even for "nice" classifiers. 2. When the post-processing is allowed to `defer' on some decisions (that is, to avoid making a decision by handing off some examples to a separate process), then for the non-deferred decisions, the resulting classifier can be made to equalize PPV, NPV, false positive rate (FPR) and false negative rate (FNR) across the protected groups. This suggests a way to partially evade the impossibility results of Chouldechova and Kleinberg et al., which preclude equalizing all of these measures simultaneously. We also present different deferring strategies and show how they affect the fairness properties of the overall system. We evaluate our post-processing techniques using the COMPAS data set from 2016.