Goto

Collaborating Authors

 Accuracy


Facebook Ads Monitor: An Independent Auditing System for Political Ads on Facebook

arXiv.org Artificial Intelligence

The 2016 United States presidential election was marked by the abuse of targeted advertising on Facebook. Concerned with the risk of the same kind of abuse to happen in the 2018 Brazilian elections, we designed and deployed an independent auditing system to monitor political ads on Facebook in Brazil. To do that we first adapted a browser plugin to gather ads from the timeline of volunteers using Facebook. We managed to convince more than 2000 volunteers to help our project and install our tool. Then, we use a Convolution Neural Network (CNN) to detect political Facebook ads using word embeddings. To evaluate our approach, we manually label a data collection of 10k ads as political or non-political and then we provide an in-depth evaluation of proposed approach for identifying political ads by comparing it with classic supervised machine learning methods. Finally, we deployed a real system that shows the ads identified as related to politics. We noticed that not all political ads we detected were present in the Facebook Ad Library for political ads. Our results emphasize the importance of enforcement mechanisms for declaring political ads and the need for independent auditing platforms.


Early detection of eye disease assisted by Deep Learning

#artificialintelligence

The eye disease diabetic retinopathy is the fastest growing cause of blindness in the world. There are over 100 million people in the world with diabetes and ideally, they would be screened each year for this degenerative eye condition. It is fully treatable if caught early, however if it's not detected you could suffer partial or full vision loss. They look for scattered hemorrhages and micro-aneurysms to grade the image. This is quite subjective in that a grade of two (2) means come back in a year and a grade of three (3) means come to the clinic right away.


Fase-AL -- Adaptation of Fast Adaptive Stacking of Ensembles for Supporting Active Learning

arXiv.org Artificial Intelligence

Classification algorithms to mine data stream have been extensively studied in recent years. However, a lot of these algorithms are designed for supervised learning which requires labeled instances. Nevertheless, the labeling of the data is costly and time-consuming. Because of this, alternative learning paradigms have been proposed to reduce the cost of the labeling process without significant loss of model performance. Active learning is one of these paradigms, whose main objective is to build classification models that request the lowest possible number of labeled examples achieving adequate levels of accuracy. Therefore, this work presents the FASE-AL algorithm which induces classification models with non-labeled instances using Active Learning. FASE-AL is based on the algorithm Fast Adaptive Stacking of Ensembles (FASE). FASE is an ensemble algorithm that detects and adapts the model when the input data stream has concept drift. FASE-AL was compared with four different strategies of active learning found in the literature. Real and synthetic databases were used in the experiments. The algorithm achieves promising results in terms of the percentage of correctly classified instances.


Extreme Algorithm Selection With Dyadic Feature Representation

arXiv.org Machine Learning

Algorithm selection (AS) deals with selecting an algorithm from a fixed set of candidate algorithms most suitable for a specific instance of an algorithmic problem, e.g., choosing solvers for SA T problems. Benchmark suites for AS usually comprise candidate sets consisting of at most tens of algorithms, whereas in combined algorithm selection and hyperparameter optimization problems the number of candidates becomes intractable, impeding to learn effective meta-models and thus requiring costly online performance evaluations. Therefore, here we propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms, facilitating meta learning. W e assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation in which both problem instances and algorithms are described. W e find the latter to improve significantly over the current state of the art in various metrics.


How to Evaluate Your Machine Learning Models with Python Code!

#artificialintelligence

You've finally built your machine learning model to predict future prices of Bitcoin so that you can finally become a multi-billionaire. But how do you know that the model you created is any good? In this article, I'm going to talk about several ways you can evaluate your machine learning model with code provided! If you don't know the difference between regression and classification models, check out here. More specifically, I'm going to cover the following metrics: R Squared is a measurement that tells you to what extent the proportion of variance in the dependent variable is explained by the variance in the independent variables.


Binary Classification from Positive Data with Skewed Confidence

arXiv.org Machine Learning

Positive-confidence (Pconf) classification [Ishida et al., 2018] is a promising weakly-supervised learning method which trains a binary classifier only from positive data equipped with confidence. However, in practice, the confidence may be skewed by bias arising in an annotation process. The Pconf classifier cannot be properly learned with skewed confidence, and consequently, the classification performance might be deteriorated. In this paper, we introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyperparameter which cancels out the negative impact of skewed confidence under the assumption that we have the misclassification rate of positive samples as a prior knowledge. We demonstrate the effectiveness of the proposed method through a synthetic experiment with simple linear models and benchmark problems with neural network models. We also apply our method to drivers' drowsiness prediction to show that it works well with a real-world problem where confidence is obtained based on manual annotation.


Real-time Out-of-distribution Detection in Learning-Enabled Cyber-Physical Systems

arXiv.org Machine Learning

Xenofon Koutsoukos V anderbilt University Nashville, TN xenofon.koutsoukos@vanderbilt.edu Abstract --Cyber-physical systems (CPS) greatly benefit by using machine learning components that can handle the uncertainty and variability of the real-world. Typical components such as deep neural networks, however, introduce new types of hazards that may impact system safety. The system behavior depends on data that are available only during runtime and may be different than the data used for training. Out-of-distribution data may lead to a large error and compromise safety. The paper considers the problem of efficiently detecting out-of-distribution data in CPS control systems. Detection must be robust and limit the number of false alarms while being computational efficient for real-time monitoring. The proposed approach leverages inductive confor-mal prediction and anomaly detection for developing a method that has a well-calibrated false alarm rate. We use variational autoencoders and deep support vector data description to learn models that can be used efficiently compute the nonconformity of new inputs relative to the training set and enable real-time detection of out-of-distribution high-dimensional inputs. We demonstrate the method using an advanced emergency braking system and a self-driving end-to-end controller implemented in an open source simulator for self-driving cars. The simulation results show very small number of false positives and detection delay while the execution time is comparable to the execution time of the original machine learning components. I NTRODUCTION Learning-enabled components (LECs) such as neural networks are used in many classes of cyber-physical systems (CPS). Semi-autonomous and autonomous vehicles, in particular, are CPS examples where LECs can play a significant role for perception, planning, and control if they are complemented with methods for analyzing and ensuring safety [1], [2]. However, there are several characteristics of LECs that can complicate safety analysis. LECs encode knowledge in a form that is not transparent.


Survey of Network Intrusion Detection Methods from the Perspective of the Knowledge Discovery in Databases Process

arXiv.org Artificial Intelligence

The identification of cyberattacks which target information and communication systems has been a focus of the research community for years. Network intrusion detection is a complex problem which presents a diverse number of challenges. Many attacks currently remain undetected, while newer ones emerge due to the proliferation of connected devices and the evolution of communication technology. In this survey, we review the methods that have been applied to network data with the purpose of developing an intrusion detector, but contrary to previous reviews in the area, we analyze them from the perspective of the Knowledge Discovery in Databases (KDD) process. As such, we discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods. In addition, we also present the characteristics and motivations behind the use of each of these techniques and propose more adequate and up-to-date taxonomies and definitions for intrusion detectors based on the terminology used in the area of data mining and KDD. Special importance is given to the evaluation procedures followed to assess the different detectors, discussing their applicability in current real networks. Finally, as a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.


Interpreting Machine Learning Malware Detectors Which Leverage N-gram Analysis

arXiv.org Artificial Intelligence

In cyberattack detection and prevention systems, cybersecurity analysts always prefer solutions that are as interpretable and understandable as rule-based or signature-based detection. This is because of the need to tune and optimize these solutions to mitigate and control the effect of false positives and false negatives. Interpreting machine learning models is a new and open challenge. However, it is expected that an interpretable machine learning solution will be domain-specific. For instance, interpretable solutions for machine learning models in healthcare are different than solutions in malware detection. This is because the models are complex, and most of them work as a black-box. Recently, the increased ability for malware authors to bypass antimalware systems has forced security specialists to look to machine learning for creating robust detection systems. If these systems are to be relied on in the industry, then, among other challenges, they must also explain their predictions. The objective of this paper is to evaluate the current state-of-the-art ML models interpretability techniques when applied to ML-based malware detectors. We demonstrate interpretability techniques in practice and evaluate the effectiveness of existing interpretability techniques in the malware analysis domain.


How do We Quantify the Quality of Our Predictions? Part I

#artificialintelligence

We have all worked on different kinds of Machine learning models, and each model needs to be evaluated in different ways. From the initial data that is provided to the outcome and the way, we as the users want to use it. A classification model would require a different metric for model evaluation as compared to a regression model or a Neural Net, and it's important to know and understand which metric to use and when. Here in this series, we go through some of these metrics, starting from the basic and the most commonly used ones to the application-specific and complex metrics that we can use. We will be starting with the basic metrics from sklearn and progress towards the more complicated metrics after that. Accuracy Score: In classification, the number of labels predicted for a sample versus the corresponding set of labels in y_true can be coined as accuracy.