Performance Analysis
How do We Quantify the Quality of Our Predictions? Part I
We have all worked on different kinds of Machine learning models, and each model needs to be evaluated in different ways. From the initial data that is provided to the outcome and the way, we as the users want to use it. A classification model would require a different metric for model evaluation as compared to a regression model or a Neural Net, and it's important to know and understand which metric to use and when. Here in this series, we go through some of these metrics, starting from the basic and the most commonly used ones to the application-specific and complex metrics that we can use. We will be starting with the basic metrics from sklearn and progress towards the more complicated metrics after that. Accuracy Score: In classification, the number of labels predicted for a sample versus the corresponding set of labels in y_true can be coined as accuracy.
Building a Lie Detector for Images
The Internet is full of fun fake images -- from flying sharks and cows on cars to a dizzying variety of celebrity mashups. Hyperrealistic image and video fakes generated by convolutional neural networks (CNNs) however are no laughing matter -- in fact they can be downright dangerous. Deepfake porn reared its ugly head in 2018, fake political speeches by world leaders have cast doubt on news sources, and during the recent Australian bushfires manipulated images mislead people regarding the location and size of fires. Fake images and videos are giving AI a black eye -- but how can the machine learning community fight back? A new paper from UC Berkeley and Adobe researchers declares war on fake images.
London Police to Deploy Facial Recognition Cameras Despite Privacy Concerns and Evidence of High Failure Rate
Police in London are moving ahead with a deploying a facial recognition camera system despite privacy concerns and evidence that the technology is riddled with false positives. The Metropolitan Police, the U.K.'s biggest police department with jurisdiction over most of London, announced Friday it would begin rolling out new "live facial recognition" cameras in London, making the capital one of the largest cities in the West to adopt the controversial technology. The "Met," as the police department is known in London, said in a statement the facial recognition technology, which is meant to identify people on a watch list and alert police to their real-time location, would be "intelligence-led" and deployed to only specific locations. It's expected to be rolled out as soon as next month. However, privacy activists immediately raised concerns, noting that independent reviews of trials of the technology showed a failure rate of 81%.
High noon for surveillance: resolving tension between the costs of false positives, challenges of calibration, and compliance – A Team
When it comes to trade surveillance, regulators want firms to do their own alert calibration, examine all alerts, and keep auditable records. Firms need to balance the real cost of false positives with the technical challenge and risk of self-calibrating and auto-calibrating, while compliance, IT and vendors have to grapple with the need for defensible and transparent audit, which challenges dynamic parameters. The webinar will review recent regulatory statements noting concerns about how trading organisations are setting parameters and managing surveillance. Moving on, it will discuss approaches and technologies that can mitigate these concerns, and question whether advanced approaches such as machine learning are a help or hindrance. Finally, it will set out practical plans for achieving successful surveillance for Market Abuse Regulation (MAR).
Lattice-based Improvements for Voice Triggering Using Graph Neural Networks
Dighe, Pranay, Adya, Saurabh, Li, Nuoyu, Vishnubhotla, Srikanth, Naik, Devang, Sagar, Adithya, Ma, Ying, Pulman, Stephen, Williams, Jason
Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using graph neural networks (GNN). The proposed approach uses the fact that decoding lattice of a falsely triggered audio exhibits uncertainties in terms of many alternative paths and unexpected words on the lattice arcs as compared to the lattice of a correctly triggered audio. A pure trigger-phrase detector model doesn't fully utilize the intent of the user speech whereas by using the complete decoding lattice of user audio, we can effectively mitigate speech not intended for the smart assistant. We deploy two variants of GNNs in this paper based on 1) graph convolution layers and 2) self-attention mechanism respectively. Our experiments demonstrate that GNNs are highly accurate in FTM task by mitigating ~87% of false triggers at 99% true positive rate (TPR). Furthermore, the proposed models are fast to train and efficient in parameter requirements.
Detection of Thin Boundaries between Different Types of Anomalies in Outlier Detection using Enhanced Neural Networks
Kiani, Rasoul, Keshavarzi, Amin, Bohlouli, Mahdi
Outlier detection has received special attention in various fields, mainly for those dealing with machine learning and artificial intelligence. As strong outliers, anomalies are divided into the point, contextual and collective outliers. The most important challenges in outlier detection include the thin boundary between the remote points and natural area, the tendency of new data and noise to mimic the real data, unlabelled datasets and different definitions for outliers in different applications. Considering the stated challenges, we defined new types of anomalies called Collective Normal Anomaly and Collective Point Anomaly in order to improve a much better detection of the thin boundary between different types of anomalies. Basic domain-independent methods are introduced to detect these defined anomalies in both unsupervised and supervised datasets. The Multi-Layer Perceptron Neural Network is enhanced using the Genetic Algorithm to detect newly defined anomalies with higher precision so as to ensure a test error less than that calculated for the conventional Multi-Layer Perceptron Neural Network. Experimental results on benchmark datasets indicated reduced error of anomaly detection process in comparison to baselines.
Reasoning About Generalization via Conditional Mutual Information
Steinke, Thomas, Zakynthinou, Lydia
How can we ensure that a machine learning system produces an o utput that generalizes to the underlying distribution, rather than overfitting its train ing data? That is, how can we ensure that the hypotheses or models that are produced are reflective of t he underlying population the training data was drawn from, rather than patterns that occur only by c hance in the training data? This is perhaps the fundamental question for the science of statist ical machine learning. A vast array of methods have been proposed to answer this ques tion. Most notably, the theory of uniform convergence shows that, if the output is sufficiently "simple," then it cannot overfit too much. A more recent line of work has used distributional stability (in the form of differential privacy) to provide generalization guarantees that compose adaptivel y - that is, statistical validity is preserved even when a dataset is reused multiple times with each succes sive analysis being influenced by the outcomes of prior analyses. Other methods for proving gener alization include compression schemes and uniform stability. Unfortunately, these different methods for providing gener alization guarantees are largely disconnected from one another; it is, in general, not possible t o compare or combine techniques. In this paper, we provide a framework to reason about many of the se these differing approaches using the unifying language of information theory.
Polarimetric Guided Nonlocal Means Covariance Matrix Estimation for Defoliation Mapping
Agersborg, Jørgen A., Anfinsen, Stian Normann, Jepsen, Jane Uhd
In this study we investigate the potential for using Synthetic Aperture Radar (SAR) data to provide high resolution defoliation and regrowth mapping of trees in the tundra-forest ecotone. Using in situ measurements collected in 2017 we calculated the proportion of both live and defoliated tree crown for 165 $10 m \times 10 m$ ground plots along six transects. Quad-polarimetric SAR data from RADARSAT-2 was collected from the same area, and the complex multilook polarimetric covariance matrix was calculated using a novel extension of guided nonlocal means speckle filtering. The nonlocal approach allows us to preserve the high spatial resolution of single-look complex data, which is essential for accurate mapping of the sparsely scattered trees in the study area. Using a standard random forest classification algorithm, our filtering results in a $73.8 \%$ classification accuracy, higher than traditional speckle filtering methods, and on par with the classification accuracy based on optical data.
Stratified cross-validation for unbiased and privacy-preserving federated learning
Bey, R., Goussault, R., Benchoufi, M., Porcher, R.
Large-scale collections of electronic records constitute both an opportunity for the development of more accurate prediction models and a threat for privacy. To limit privacy exposure new privacy-enhancing techniques are emerging such as federated learning which enables large-scale data analysis while avoiding the centralization of records in a unique database that would represent a critical point of failure. Although promising regarding privacy protection, federated learning prevents using some data-cleaning algorithms thus inducing new biases. In this work we focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances. We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings without relying on demanding deduplication algorithms.
Receiver Operating Characteristic Curves Demystified (in Python)
In Data Science, evaluating model performance is very important and the most commonly used performance metric is the classification score. However, when dealing with fraud datasets with heavy class imbalance, a classification score does not make much sense. Instead, Receiver Operating Characteristic or ROC curves offer a better alternative. ROC is a plot of signal (True Positive Rate) against noise (False Positive Rate). The model performance is determined by looking at the area under the ROC curve (or AUC).