Performance Analysis
Stochastic Hard Thresholding Algorithms for AUC Maximization
Yang, Zhenhuan, Zhou, Baojian, Lei, Yunwen, Ying, Yiming
In this paper, we aim to develop stochastic hard thresholding algorithms for the important problem of AUC maximization in imbalanced classification. The main challenge is the pairwise loss involved in AUC maximization. We overcome this obstacle by reformulating the U-statistics objective function as an empirical risk minimization (ERM), from which a stochastic hard thresholding algorithm (\texttt{SHT-AUC}) is developed. To our best knowledge, this is the first attempt to provide stochastic hard thresholding algorithms for AUC maximization with a per-iteration cost $\O(b d)$ where $d$ and $b$ are the dimension of the data and the minibatch size, respectively. We show that the proposed algorithm enjoys the linear convergence rate up to a tolerance error. In particular, we show, if the data is generated from the Gaussian distribution, then its convergence becomes slower as the data gets more imbalanced. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.
Ensuring Fairness Beyond the Training Data
Mandal, Debmalya, Deng, Samuel, Jana, Suman, Wing, Jeannette M., Hsu, Daniel
We initiate the study of fair classifiers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classifiers. In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples. We formulate a min-max objective function whose goal is to minimize a distributionally robust training loss, and at the same time, find a classifier that is fair with respect to a class of distributions. We first reduce this problem to finding a fair classifier that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers.
NeuMiss networks: differentiable programming for supervised learning with missing values
Morvan, Marine Le, Josse, Julie, Moreau, Thomas, Scornet, Erwan, Varoquaux, Gaรซl
The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing patterns, which can be exponential in the number of dimensions. In this work, we derive the analytical form of the optimal predictor under a linearity assumption and various missing data mechanisms including Missing at Random (MAR) and self-masking (Missing Not At Random). Based on a Neumann-series approximation of the optimal predictor, we propose a new principled architecture, named NeuMiss networks. Their originality and strength come from the use of a new type of non-linearity: the multiplication by the missingness indicator. We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns. As a result they scale well to problems with many features, and remain statistically efficient for medium-sized samples. Moreover, we show that, contrary to procedures using EM or imputation, they are robust to the missing data mechanism, including difficult MNAR settings such as self-masking.
Rapid coronavirus antigen tests may give false positives, FDA warns
Our technology has advanced, our diagnostics have improved and our testing capability has advanced since the beginning of this pandemic, says Dr. Nicole Saphier, Fox News medical contributor. The Food and Drug Administration (FDA) warned about the possibility of false positives that can occur when using rapid antigen tests to detect coronavirus, particularly if the test is not used correctly. The regulatory agency said it has received reports of false-positive results occurring in nursing homes and other health care settings. The agency warned that reading the test results either before or after the specified time provided in the instructions can result in false-positive or false-negative results. It also referenced the antigen EUA conditions of authorization, which specifies that authorized laboratories are to follow the manufacturer's instructions for use regarding administering the test and reading the results.
Standardized Variable Distances: A distance-based machine learning method
Today, machine learning algorithms are an important research area capable of analyzing and modeling data in any field. Information obtained through machine learning methods helps researchers and planners to understand and review systematic problems of their current strategies. Thus, it is very important to work fully in every field that facilitates human life, such as early and correct diagnosis, correct choice, fully functioning autonomous systems. In this paper, a novel machine learning algorithm for multiclass classification is presented. The proposed method is designed based on the Minimum Distance Classifier (MDC) algorithm. The MDC is variance-insensitive because it classifies input vectors by calculating their distances/similarities with respect to class-centroids (average value of input vectors of a class).
Coronavirus: Liverpool to pilot city-wide Covid-19 testing
False positives - when you don't have the virus, but the test says you do - are also a bigger problem when you test large numbers of people. One analysis suggested a twice-a-week test for six months using a test with a 1% false positive rate would lead to more than 40% of people being wrongly told they had the virus.
Secure communication between UAVs using a method based on smart agents in unmanned aerial vehicles
Faraji-Biregani, Maryam, Fotohi, Reza
Unmanned aerial vehicles (UAVs) can be deployed to monitor very large areas without the need for network infrastructure. UAVs communicate with each other during flight and exchange information with each other. However, such communication poses security challenges due to its dynamic topology. To solve these challenges, the proposed method uses two phases to counter malicious UAV attacks. In the first phase, we applied a number of rules and principles to detect malicious UAVs. In this phase, we try to identify and remove malicious UAVs according to the behavior of UAVs in the network in order to prevent sending fake information to the investigating UAVs. In the second phase, a mobile agent based on a three-step negotiation process is used to eliminate malicious UAVs. In this way, we use mobile agents to inform our normal neighbor UAVs so that they do not listen to the data generated by the malicious UAVs. Therefore, the mobile agent of each UAV uses reliable neighbors through a three-step negotiation process so that they do not listen to the traffic generated by the malicious UAVs. The NS-3 simulator was used to demonstrate the efficiency of the SAUAV method. The proposed method is more efficient than CST-UAS, CS-AVN, HVCR, and BSUM-based methods in detection rate, false positive rate, false negative rate, packet delivery rate, and residual energy.
Automated Hyperparameter Selection for the PC Algorithm
The PC algorithm infers causal relations using conditional independence tests that require a pre-specified Type I $\alpha$ level. PC is however unsupervised, so we cannot tune $\alpha$ using traditional cross-validation. We therefore propose AutoPC, a fast procedure that optimizes $\alpha$ directly for a user chosen metric. We in particular force PC to double check its output by executing a second run on the recovered graph. We choose the final output as the one which maximizes stability between the two runs. AutoPC consistently outperforms the state of the art across multiple metrics.
Fairness without Demographics through Adversarially Reweighted Learning
Lahoti, Preethi, Beutel, Alex, Chen, Jilin, Lee, Kang, Prost, Flavien, Thain, Nithum, Wang, Xuezhi, Chi, Ed H.
Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that {ARL} improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives.
(Un)fairness in Post-operative Complication Prediction Models
Tripathi, Sandhya, Fritz, Bradley A., Abdelhack, Mohamed, Avidan, Michael S., Chen, Yixin, King, Christopher R.
With the current ongoing debate about fairness, explainability and transparency of machine learning models, their application in high-impact clinical decision-making systems must be scrutinized. We consider a real-life example of risk estimation before surgery and investigate the potential for bias or unfairness of a variety of algorithms. Our approach creates transparent documentation of potential bias so that the users can apply the model carefully. We augment a model-card like analysis using propensity scores with a decision-tree based guide for clinicians that would identify predictable shortcomings of the model. In addition to functioning as a guide for users, we propose that it can guide the algorithm development and informatics team to focus on data sources and structures that can address these shortcomings.