Performance Analysis
Man & Machine – A Mutually Beneficial Partnership in the Age of Artificial Intelligence
Analysis predicting major societal problems caused by artificial intelligence (AI) surfaces every other day: how AI could be used to manipulate elections and launch drone attacks. The major fear seems to be that AI is set to make humans a redundant force in the workplace. Yes, AI, like any evolving technology, is set to change our jobs, but could it also be the key to unlocking creativity and productivity in the business sector? It's clear that nothing is holding AI back. Replacing an existing business process requires a clear investment case.
Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes
Chen, Tongfei, Navrátil, Jiří, Iyengar, Vijay, Shanmugam, Karthikeyan
We propose a confidence scoring mechanism for multi-layer neural networks based on a paradigm of a base model and a meta-model. The confidence score is learned by the meta-model using features derived from the base model -- a deep multi-layer neural network -- considered a whitebox. As features, we investigate linear classifier probes inserted between the various layers of the base model and trained using each layer's intermediate activations. Experiments show that this approach outperforms various baselines in a filtering task, i.e., task of rejecting samples with low confidence. Experimental results are presented using CIFAR-10 and CIFAR-100 dataset with and without added noise exploring various aspects of the method.
Model selection with lasso-zero: adding straw to the haystack to better find needles
Descloux, Pascaline, Sardy, Sylvain
The high-dimensional linear model $y = X \beta^0 + \epsilon$ is considered and the focus is put on the problem of recovering the support $S^0$ of the sparse vector $\beta^0.$ We introduce lasso-zero, a new $\ell_1$-based estimator whose novelty resides in an "overfit, then threshold" paradigm and the use of noise dictionaries for overfitting the response. The methodology is supported by theoretical results obtained in the special case where no noise dictionary is used. In this case, lasso-zero boils down to thresholding the basis pursuit solution. We prove that this procedure requires weaker conditions on $X$ and $S^0$ than the lasso for exact support recovery, and controls the false discovery rate for orthonormal designs when tuned by the quantile universal threshold. However it requires a high signal-to-noise ratio, and the use of noise dictionaries addresses this issue. The threshold selection procedure is based on a pivotal statistic and does not require knowledge of the noise level. Numerical simulations show that lasso-zero performs well in terms of support recovery and provides a good trade-off between high true positive rate and low false discovery rate compared to competitors.
Large-Scale QA-SRL Parsing
FitzGerald, Nicholas, Michael, Julian, He, Luheng, Zettlemoyer, Luke
We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that we show has high precision and good recall at modest cost. We also present neural models for two QA-SRL subtasks: detecting argument spans for a predicate and generating questions to label the semantic relationship. The best models achieve question accuracy of 82.6% and span-level accuracy of 77.6% (under human evaluation) on the full pipelined QA-SRL prediction task. They can also, as we show, be used to gather additional annotations at low cost.
Conversations Gone Awry: Detecting Early Signs of Conversational Failure
Zhang, Justine, Chang, Jonathan P., Danescu-Niculescu-Mizil, Cristian, Dixon, Lucas, Hua, Yiqing, Thain, Nithum, Taraborelli, Dario
One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversation might still be salvaged. To this end, we develop a framework for capturing pragmatic devices---such as politeness strategies and rhetorical prompts---used to start a conversation, and analyze their relation to its future trajectory. Applying this framework in a controlled setting, we demonstrate the feasibility of detecting early warning signs of antisocial behavior in online discussions.
You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information
Perez, Beatrice, Musolesi, Mirco, Stringhini, Gianluca
Metadata are associated with most of the information we produce in our daily interactions and communication in the digital world. Y et, surprisingly, metadata are often still categorized as nonsensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study to quantify the uniqueness of the association between metadata and user identity and to understand the effectiveness of potential obfuscation strategies. More specifically, we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that, through the application of a supervised learning algorithm, we are able to identify any user in a group of 10,000 with approximately 96.7% accuracy. Moreover, if we broaden the scope of our search and consider the 10 most likely candidates we increase the accuracy of the model to 99.22%. We also found that data obfuscation is hard and ineffective for this type of data: even after perturbing 60% of the training data, it is still possible to classify users with an accuracy higher than 95%. These results have strong implications in terms of the design of metadata obfuscation strategies, for example for data set release, not only for Twitter, but, more generally, for most social media platforms.
Metropolitan Police's facial recognition technology 98% inaccurate, figures show
Facial recognition software used by the UK's biggest police force has returned false positives in more than 98 per cent of alerts generated, The Independent can reveal, with the country's biometrics regulator calling it "not yet fit for use". The Metropolitan Police's system has produced 104 alerts of which only two were later confirmed to be positive matches, a freedom of information request showed. In its response the force said it did not consider the inaccurate matches "false positives" because alerts were checked a second time after they occurred. Facial recognition technology scans people in a video feed and compares their images to pictures stored in a reference library or watch list. It has been used at large events like the Notting Hill Carnival and a Six Nations Rugby match. The system used by another force, South Wales Police, has returned more than 2,400 false positives in 15 deployments since June 2017.
A Simple and Effective Model-Based Variable Importance Measure
Greenwell, Brandon M., Boehmke, Bradley C., McCarthy, Andrew J.
In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted decision trees---have a natural way of quantifying the importance or relative influence of each feature. Other algorithms---like naive Bayes classifiers and support vector machines---are not capable of doing so and model-free approaches are generally used to measure each predictor's importance. In this paper, we propose a standardized, model-based approach to measuring predictor importance across the growing spectrum of supervised learning algorithms. Our proposed method is illustrated through both simulated and real data examples. The R code to reproduce all of the figures in this paper is available in the supplementary materials.
Agreement Rate Initialized Maximum Likelihood Estimator for Ensemble Classifier Aggregation and Its Application in Brain-Computer Interface
Wu, Dongrui, Lawhern, Vernon J., Gordon, Stephen, Lance, Brent J., Lin, Chin-Teng
Ensemble learning is a powerful approach to construct a strong learner from multiple base learners. The most popular way to aggregate an ensemble of classifiers is majority voting, which assigns a sample to the class that most base classifiers vote for. However, improved performance can be obtained by assigning weights to the base classifiers according to their accuracy. This paper proposes an agreement rate initialized maximum likelihood estimator (ARIMLE) to optimally fuse the base classifiers. ARIMLE first uses a simplified agreement rate method to estimate the classification accuracy of each base classifier from the unlabeled samples, then employs the accuracies to initialize a maximum likelihood estimator (MLE), and finally uses the expectation-maximization algorithm to refine the MLE. Extensive experiments on visually evoked potential classification in a brain-computer interface application show that ARIMLE outperforms majority voting, and also achieves better or comparable performance with several other state-of-the-art classifier combination approaches.
Improved Predictive Models for Acute Kidney Injury with IDEAs: Intraoperative Data Embedded Analytics
Adhikari, Lasith, Ozrazgat-Baslanti, Tezcan, Thottakkara, Paul, Ebadi, Ashkan, Motaei, Amir, Rashidi, Parisa, Li, Xiaolin, Bihorac, Azra
Acute kidney injury (AKI) is a common and serious complication after a surgery which is associated with morbidity and mortality. The majority of existing perioperative AKI risk score prediction models are limited in their generalizability and do not fully utilize the physiological intraoperative time-series data. Thus, there is a need for intelligent, accurate, and robust systems, able to leverage information from large-scale data to predict patient's risk of developing postoperative AKI. A retrospective single-center cohort of 2,911 adult patients who underwent surgery at the University of Florida Health has been used for this study. We used machine learning and statistical analysis techniques to develop perioperative models to predict the risk of AKI (risk during the first 3 days, 7 days, and until the discharge day) before and after the surgery. In particular, we examined the improvement in risk prediction by incorporating three intraoperative physiologic time series data, i.e., mean arterial blood pressure, minimum alveolar concentration, and heart rate. For an individual patient, the preoperative model produces a probabilistic AKI risk score, which will be enriched by integrating intraoperative statistical features through a machine learning stacking approach inside a random forest classifier. We compared the performance of our model based on the area under the receiver operating characteristics curve (AUROC), accuracy and net reclassification improvement (NRI). The predictive performance of the proposed model is better than the preoperative data only model. For AKI-7day outcome: The AUC was 0.86 (accuracy was 0.78) in the proposed model, while the preoperative AUC was 0.84 (accuracy 0.76). Furthermore, with the integration of intraoperative features, we were able to classify patients who were misclassified in the preoperative model.