Goto

Collaborating Authors

 Performance Analysis


Statistical Formulas for F Measures

arXiv.org Machine Learning

The F measures are very commonly used to estimate the performance of machine learning methods (see, e.g., the Wikipedia entry of F score). This paper provides simple formulas for their standard errors, probability distributions, and the related confidence intervals and sample size planning based on large data. We will first use a real data set (Stine, Foster, and Waterman 1998) to illustrate the concept of the F measures. A purchase for one of the two brands of orange juices: Citrus Hill and Minimaid, is coded respectively as Z 1 and Z 0 and modeled as a random variable. A score S summarizing the preference to the Citrus Hill brand is assigned to this purchase. This score S is also modeled as a random variable since it depends on factors such as customer loyalty and price difference, which can differ for each purchase.


A method to integrate and classify normal distributions

arXiv.org Machine Learning

Univariate and multivariate normal probability distributions are widely used when modeling decisions under uncertainty. Computing the performance of such models requires integrating these distributions over specific domains, which can vary widely across models. Besides some special cases where these integrals are easy to calculate, there exists no general analytical expression, standard numerical method or software for these integrals. Here we present mathematical results and software that provide (i) the probability in any domain of a normal in any dimensions with any parameters, (ii) the probability density, distribution, and percentage points of any function of a normal vector, (iii) quantities, such as the error matrix and discriminability, which summarize classification performance amongst any number of normal distributions, (iv) dimension reduction and visualizations for all such problems, and (v) tests for how reliably these methods can be used on given data. We illustrate these tools with models for detecting occluding targets in natural scenes and for detecting camouflage.


Effective Email Spam Detection System using Extreme Gradient Boosting

arXiv.org Artificial Intelligence

The popularity, cost-effectiveness and ease of information exchange that electronic mails offer to electronic device users has been plagued with the rising number of unsolicited or spam emails. Driven by the need to protect email users from this growing menace, research in spam email filtering/detection systems has being increasingly active in the last decade. However, the adaptive nature of spam emails has often rendered most of these systems ineffective. While several spam detection models have been reported in literature, the reported performance on an out of sample test data shows the room for more improvement. Presented in this research is an improved spam detection model based on Extreme Gradient Boosting (XGBoost) which to the best of our knowledge has received little attention spam email detection problems. Experimental results show that the proposed model outperforms earlier approaches across a wide range of evaluation metrics. A thorough analysis of the model results in comparison to the results of earlier works is also presented.


Aerial Imagery Pile burn detection using Deep Learning: the FLAME dataset

arXiv.org Artificial Intelligence

Wildfires are one of the costliest and deadliest natural disasters in the US, causing damage to millions of hectares of forest resources and threatening the lives of people and animals. Of particular importance are risks to firefighters and operational forces, which highlights the need for leveraging technology to minimize danger to people and property. FLAME (Fire Luminosity Airborne-based Machine learning Evaluation) offers a dataset of aerial images of fires along with methods for fire detection and segmentation which can help firefighters and researchers to develop optimal fire management strategies. This paper provides a fire image dataset collected by drones during a prescribed burning piled detritus in an Arizona pine forest. The dataset includes video recordings and thermal heatmaps captured by infrared cameras. The captured videos and images are annotated and labeled frame-wise to help researchers easily apply their fire detection and modeling algorithms. The paper also highlights solutions to two machine learning problems: (1) Binary classification of video frames based on the presence [and absence] of fire flames. An Artificial Neural Network (ANN) method is developed that achieved a 76% classification accuracy. (2) Fire detection using segmentation methods to precisely determine fire borders. A deep learning method is designed based on the U-Net up-sampling and down-sampling approach to extract a fire mask from the video frames. Our FLAME method approached a precision of 92% and a recall of 84%. Future research will expand the technique for free burning broadcast fire using thermal images.


FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping

arXiv.org Artificial Intelligence

Byzantine-robust federated learning aims to enable a service provider to learn an accurate global model when a bounded number of clients are malicious. The key idea of existing Byzantine-robust federated learning methods is that the service provider performs statistical analysis among the clients' local model updates and removes suspicious ones, before aggregating them to update the global model. However, malicious clients can still corrupt the global models in these methods via sending carefully crafted local model updates to the service provider. The fundamental reason is that there is no root of trust in existing federated learning methods. In this work, we bridge the gap via proposing FLTrust, a new federated learning method in which the service provider itself bootstraps trust. In particular, the service provider itself collects a clean small training dataset (called root dataset) for the learning task and the service provider maintains a model (called server model) based on it to bootstrap trust. In each iteration, the service provider first assigns a trust score to each local model update from the clients, where a local model update has a lower trust score if its direction deviates more from the direction of the server model update. Then, the service provider normalizes the magnitudes of the local model updates such that they lie in the same hyper-sphere as the server model update in the vector space. Our normalization limits the impact of malicious local model updates with large magnitudes. Finally, the service provider computes the average of the normalized local model updates weighted by their trust scores as a global model update, which is used to update the global model. Our extensive evaluations on six datasets from different domains show that our FLTrust is secure against both existing attacks and strong adaptive attacks.


How to easily check if your Machine Learning model is fair? - KDnuggets

#artificialintelligence

We live in a world that is getting more divided each day. In some parts of the world, the differences and inequalities between races, ethnicities, and sometimes sexes are aggravating. The data we use for modeling is, in the major part, a reflection of the world it derives from. And the world can be biased, so data and therefore the model will likely reflect that. We propose a way in which ML engineers can easily check if their model is biased.


Millimeter Wave Sensing: A Review of Application Pipelines and Building Blocks

arXiv.org Artificial Intelligence

The increasing bandwidth requirement of new wireless applications has lead to standardization of the millimeter wave spectrum for high-speed wireless communication. The millimeter wave spectrum is part of 5G and covers frequencies between 30 and 300 GHz corresponding to wavelengths ranging from 10 to 1 mm. Although millimeter wave is often considered as a communication medium, it has also proved to be an excellent 'sensor', thanks to its narrow beams, operation across a wide bandwidth, and interaction with atmospheric constituents. In this paper, which is to the best of our knowledge the first review that completely covers millimeter wave sensing application pipelines, we provide a comprehensive overview and analysis of different basic application pipeline building blocks, including hardware, algorithms, analytical models, and model evaluation techniques. The review also provides a taxonomy that highlights different millimeter wave sensing application domains. By performing a thorough analysis, complying with the systematic literature review methodology and reviewing 165 papers, we not only extend previous investigations focused only on communication aspects of the millimeter wave technology and using millimeter wave technology for active imaging, but also highlight scientific and technological challenges and trends, and provide a future perspective for applications of millimeter wave as a sensing technology.


More Powerful and General Selective Inference for Stepwise Feature Selection using the Homotopy Continuation Approach

arXiv.org Machine Learning

As machine learning (ML) is being applied to a greater variety of practical problems, ensuring the reliability of ML is recognized as becoming increasingly important. Among several potential approaches to reliable ML, conditional selective inference (SI) is recognized as a promising approach for evaluating the statistical reliability of data-driven hypotheses selected by ML methods. The basic idea of conditional SI is to make inference on a data-driven hypothesis conditional on the selection event that the hypothesis is selected by analyzing the data with the ML algorithm. Conditional SI has been actively studied especially in the context of feature selection. Notably, Lee et al. [1] and Tibshirani et al. [2] proposed conditional SI methods for exact conditional inference on selected features by using Lasso and stepwise feature selection (SFS), respectively.


Predicting Seminal Quality with the Dominance-Based Rough Sets Approach

arXiv.org Artificial Intelligence

The paper relies on the clinical data of a previously published study. We identify two very questionable assumptions of said work, namely confusing evidence of absence and absence of evidence, and neglecting the ordinal nature of attributes' domains. We then show that using an adequate ordinal methodology such as the dominance-based rough sets approach (DRSA) can significantly improve the predictive accuracy of the expert system, resulting in almost complete accuracy for a dataset of 100 instances. Beyond the performance of DRSA in solving the diagnosis problem at hand, these results suggest the inadequacy and triviality of the underlying dataset. We provide links to open data from the UCI machine learning repository to allow for an easy verification/refutation of the claims made in this paper. Keywords: Decision Support Systems, Expert Systems, Dominance Based Rough Set Approach, Diagnosis, Seminal Quality.


A Multimodal Framework for the Detection of Hateful Memes

arXiv.org Artificial Intelligence

An increasingly common expression of online hate speech is multimodal in nature and comes in the form of memes. Designing systems to automatically detect hateful content is of paramount importance if we are to mitigate its undesirable effects on the society at large. The detection of multimodal hate speech is an intrinsically difficult and open problem: memes convey a message using both images and text and, hence, require multimodal reasoning and joint visual and language understanding. In this work, we seek to advance this line of research and develop a multimodal framework for the detection of hateful memes. We improve the performance of existing multimodal approaches beyond simple fine-tuning and, among others, show the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning based on cross-validation to improve robustness. We furthermore analyze model misclassifications and discuss a number of hypothesis-driven augmentations and their effects on performance, presenting important implications for future research in the field. Our best approach comprises an ensemble of UNITER-based models and achieves an AUROC score of 80.53, placing us 4th on phase 2 of the 2020 Hateful Memes Challenge organized by Facebook.