Goto

Collaborating Authors

 Accuracy


Stochastic Hard Thresholding Algorithms for AUC Maximization

arXiv.org Machine Learning

In this paper, we aim to develop stochastic hard thresholding algorithms for the important problem of AUC maximization in imbalanced classification. The main challenge is the pairwise loss involved in AUC maximization. We overcome this obstacle by reformulating the U-statistics objective function as an empirical risk minimization (ERM), from which a stochastic hard thresholding algorithm (\texttt{SHT-AUC}) is developed. To our best knowledge, this is the first attempt to provide stochastic hard thresholding algorithms for AUC maximization with a per-iteration cost $\O(b d)$ where $d$ and $b$ are the dimension of the data and the minibatch size, respectively. We show that the proposed algorithm enjoys the linear convergence rate up to a tolerance error. In particular, we show, if the data is generated from the Gaussian distribution, then its convergence becomes slower as the data gets more imbalanced. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.


Ensuring Fairness Beyond the Training Data

arXiv.org Machine Learning

We initiate the study of fair classifiers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classifiers. In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples. We formulate a min-max objective function whose goal is to minimize a distributionally robust training loss, and at the same time, find a classifier that is fair with respect to a class of distributions. We first reduce this problem to finding a fair classifier that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers.


NeuMiss networks: differentiable programming for supervised learning with missing values

arXiv.org Artificial Intelligence

The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing patterns, which can be exponential in the number of dimensions. In this work, we derive the analytical form of the optimal predictor under a linearity assumption and various missing data mechanisms including Missing at Random (MAR) and self-masking (Missing Not At Random). Based on a Neumann-series approximation of the optimal predictor, we propose a new principled architecture, named NeuMiss networks. Their originality and strength come from the use of a new type of non-linearity: the multiplication by the missingness indicator. We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns. As a result they scale well to problems with many features, and remain statistically efficient for medium-sized samples. Moreover, we show that, contrary to procedures using EM or imputation, they are robust to the missing data mechanism, including difficult MNAR settings such as self-masking.


Rapid coronavirus antigen tests may give false positives, FDA warns

FOX News

Our technology has advanced, our diagnostics have improved and our testing capability has advanced since the beginning of this pandemic, says Dr. Nicole Saphier, Fox News medical contributor. The Food and Drug Administration (FDA) warned about the possibility of false positives that can occur when using rapid antigen tests to detect coronavirus, particularly if the test is not used correctly. The regulatory agency said it has received reports of false-positive results occurring in nursing homes and other health care settings. The agency warned that reading the test results either before or after the specified time provided in the instructions can result in false-positive or false-negative results. It also referenced the antigen EUA conditions of authorization, which specifies that authorized laboratories are to follow the manufacturer's instructions for use regarding administering the test and reading the results.


Standardized Variable Distances: A distance-based machine learning method

#artificialintelligence

Today, machine learning algorithms are an important research area capable of analyzing and modeling data in any field. Information obtained through machine learning methods helps researchers and planners to understand and review systematic problems of their current strategies. Thus, it is very important to work fully in every field that facilitates human life, such as early and correct diagnosis, correct choice, fully functioning autonomous systems. In this paper, a novel machine learning algorithm for multiclass classification is presented. The proposed method is designed based on the Minimum Distance Classifier (MDC) algorithm. The MDC is variance-insensitive because it classifies input vectors by calculating their distances/similarities with respect to class-centroids (average value of input vectors of a class).


Coronavirus: Liverpool to pilot city-wide Covid-19 testing

BBC News

False positives - when you don't have the virus, but the test says you do - are also a bigger problem when you test large numbers of people. One analysis suggested a twice-a-week test for six months using a test with a 1% false positive rate would lead to more than 40% of people being wrongly told they had the virus.


Secure communication between UAVs using a method based on smart agents in unmanned aerial vehicles

arXiv.org Artificial Intelligence

Unmanned aerial vehicles (UAVs) can be deployed to monitor very large areas without the need for network infrastructure. UAVs communicate with each other during flight and exchange information with each other. However, such communication poses security challenges due to its dynamic topology. To solve these challenges, the proposed method uses two phases to counter malicious UAV attacks. In the first phase, we applied a number of rules and principles to detect malicious UAVs. In this phase, we try to identify and remove malicious UAVs according to the behavior of UAVs in the network in order to prevent sending fake information to the investigating UAVs. In the second phase, a mobile agent based on a three-step negotiation process is used to eliminate malicious UAVs. In this way, we use mobile agents to inform our normal neighbor UAVs so that they do not listen to the data generated by the malicious UAVs. Therefore, the mobile agent of each UAV uses reliable neighbors through a three-step negotiation process so that they do not listen to the traffic generated by the malicious UAVs. The NS-3 simulator was used to demonstrate the efficiency of the SAUAV method. The proposed method is more efficient than CST-UAS, CS-AVN, HVCR, and BSUM-based methods in detection rate, false positive rate, false negative rate, packet delivery rate, and residual energy.


(Un)fairness in Post-operative Complication Prediction Models

arXiv.org Artificial Intelligence

With the current ongoing debate about fairness, explainability and transparency of machine learning models, their application in high-impact clinical decision-making systems must be scrutinized. We consider a real-life example of risk estimation before surgery and investigate the potential for bias or unfairness of a variety of algorithms. Our approach creates transparent documentation of potential bias so that the users can apply the model carefully. We augment a model-card like analysis using propensity scores with a decision-tree based guide for clinicians that would identify predictable shortcomings of the model. In addition to functioning as a guide for users, we propose that it can guide the algorithm development and informatics team to focus on data sources and structures that can address these shortcomings.


On Cross-Dataset Generalization in Automatic Detection of Online Abuse

arXiv.org Artificial Intelligence

NLP research has attained high performances in abusive language detection as a supervised classification task. While in research settings, training and test datasets are usually obtained from similar data samples, in practice systems are often applied on data that are different from the training set in topic and class distributions. Also, the ambiguity in class definitions inherited in this task aggravates the discrepancies between source and target datasets. We explore the topic bias and the task formulation bias in cross-dataset generalization. We show that the benign examples in the Wikipedia Detox dataset are biased towards platform-specific topics. We identify these examples using unsupervised topic modeling and manual inspection of topics' keywords. Removing these topics increases cross-dataset generalization, without reducing in-domain classification performance. For a robust dataset design, we suggest applying inexpensive unsupervised methods to inspect the collected data and downsize the non-generalizable content before manually annotating for class labels.


Recyclable Waste Identification Using CNN Image Recognition and Gaussian Clustering

arXiv.org Artificial Intelligence

Abstract-Waste recycling is an important way of saving This study uses transfer learning from a pre-trained Resnet-energy and materials in the production process. In general 50 model to generate a model which is capable of classifying cases recyclable objects are mixed with unrecyclable objects, images of individual waste objects into the following six which raises a need for identification and classification. To paper proposes a convolutional neural network (CNN) model integrate the model into actual application, which often deals to complete both tasks. The model uses transfer learning with bird's-eye view of piles of waste, a sliding-window process from a pretrained Resnet-50 CNN to complete feature in the pre-classification stage split the image into smaller extraction. A subsequent fully connected layer for fragments for the CNN to process, and the labelled points are classification was trained on the augmented TrashNet dataset integrated with Gaussian Mixture Model in the postclassification [1].