Goto

Collaborating Authors

 Accuracy


Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging

arXiv.org Artificial Intelligence

Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models for chest radiograph diagnosis regarding accuracy and fairness compared to non-private training. For this, we used a large dataset (N=193,311) of high quality clinical chest radiographs, which were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that the non-private CNNs achieved an average AUROC score of 0.90 +- 0.04 over all labels, whereas the DP CNNs with a privacy budget of epsilon=7.89 resulted in an AUROC of 0.87 +- 0.04, i.e., a mere 2.6% performance decrease compared to non-private training. Furthermore, we found the privacy-preserving training not to amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.


EscherNet 101

arXiv.org Artificial Intelligence

A deep learning model, EscherNet 101, is constructed to categorize images of 2D periodic patterns into their respective 17 wallpaper groups. Beyond evaluating EscherNet 101 performance by classification rates, at a micro-level we investigate the filters learned at different layers in the network, capable of capturing second-order invariants beyond edge and curvature.


Training Machine Learning Models to Characterize Temporal Evolution of Disadvantaged Communities

arXiv.org Artificial Intelligence

Disadvantaged communities (DAC), as defined by the Justice40 initiative of the Department of Energy (DOE), USA, identifies census tracts across the USA to determine where benefits of climate and energy investments are or are not currently accruing. The DAC status not only helps in determining the eligibility for future Justice40-related investments but is also critical for exploring ways to achieve equitable distribution of resources. However, designing inclusive and equitable strategies not just requires a good understanding of current demographics, but also a deeper analysis of the transformations that happened in those demographics over the years. In this paper, machine learning (ML) models are trained on publicly available census data from recent years to classify the DAC status at the census tracts level and then the trained model is used to classify DAC status for historical years. A detailed analysis of the feature and model selection along with the evolution of disadvantaged communities between 2013 and 2018 is presented in this study.


Causal Representation Learning for Instantaneous and Temporal Effects in Interactive Systems

arXiv.org Artificial Intelligence

Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that allows for instantaneous effects in intervened temporal sequences when intervention targets can be observed, e.g., as actions of an agent. iCITRIS identifies the potentially multidimensional causal variables from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three datasets of interactive systems, iCITRIS accurately identifies the causal variables and their causal graph.


Fingerprint Presentation Attack Detection by Channel-wise Feature Denoising

arXiv.org Artificial Intelligence

Due to the diversity of attack materials, fingerprint recognition systems (AFRSs) are vulnerable to malicious attacks. It is thus important to propose effective fingerprint presentation attack detection (PAD) methods for the safety and reliability of AFRSs. However, current PAD methods often exhibit poor robustness under new attack types settings. This paper thus proposes a novel channel-wise feature denoising fingerprint PAD (CFD-PAD) method by handling the redundant noise information ignored in previous studies. The proposed method learns important features of fingerprint images by weighing the importance of each channel and identifying discriminative channels and "noise" channels. Then, the propagation of "noise" channels is suppressed in the feature map to reduce interference. Specifically, a PA-Adaptation loss is designed to constrain the feature distribution to make the feature distribution of live fingerprints more aggregate and that of spoof fingerprints more disperse. Experimental results evaluated on the LivDet 2017 dataset showed that the proposed CFD-PAD can achieve a 2.53% average classification error (ACE) and a 93.83% true detection rate when the false detection rate equals 1.0% (TDR@FDR=1%). Also, the proposed method markedly outperforms the best single-model-based methods in terms of ACE (2.53% vs. 4.56%) and TDR@FDR=1%(93.83% vs. 73.32%), which demonstrates its effectiveness. Although we have achieved a comparable result with the state-of-the-art multiple-model-based methods, there still is an increase in TDR@FDR=1% from 91.19% to 93.83%. In addition, the proposed model is simpler, lighter and more efficient and has achieved a 74.76% reduction in computation time compared with the state-of-the-art multiple-model-based method. The source code is available at https://github.com/kongzhecn/cfd-pad.


Region and Spatial Aware Anomaly Detection for Fundus Images

arXiv.org Artificial Intelligence

Recently anomaly detection has drawn much attention in diagnosing ocular diseases. Most existing anomaly detection research in fundus images has relatively large anomaly scores in the salient retinal structures, such as blood vessels, optical cups and discs. In this paper, we propose a Region and Spatial Aware Anomaly Detection (ReSAD) method for fundus images, which obtains local region and long-range spatial information to reduce the false positives in the normal structure. ReSAD transfers a pre-trained model to extract the features of normal fundus images and applies the Region-and-Spatial-Aware feature Combination module (ReSC) for pixel-level features to build a memory bank. In the testing phase, ReSAD uses the memory bank to determine out-of-distribution samples as abnormalities. Our method significantly outperforms the existing anomaly detection methods for fundus images on two publicly benchmark datasets.


Can Membership Inferencing be Refuted?

arXiv.org Artificial Intelligence

Membership inference (MI) attack is currently the most popular test for measuring privacy leakage in machine learning models. Given a machine learning model, a data point and some auxiliary information, the goal of an MI attack is to determine whether the data point was used to train the model. In this work, we study the reliability of membership inference attacks in practice. Specifically, we show that a model owner can plausibly refute the result of a membership inference test on a data point $x$ by constructing a proof of repudiation that proves that the model was trained without $x$. We design efficient algorithms to construct proofs of repudiation for all data points of the training dataset. Our empirical evaluation demonstrates the practical feasibility of our algorithm by constructing proofs of repudiation for popular machine learning models on MNIST and CIFAR-10. Consequently, our results call for a re-evaluation of the implications of membership inference attacks in practice.


Video traffic identification with novel feature extraction and selection method

arXiv.org Artificial Intelligence

In recent years, the rapid rise of video applications has led to an explosion of Internet video traffic, thereby posing severe challenges to network management. Therefore, effectively identifying and managing video traffic has become an urgent problem to be solved. However, the existing video traffic feature extraction methods mainly target at the traditional packet and flow level features, and the video traffic identification accuracy is low. Additionally, the issue of high data dimension often exists in video traffic identification, requiring an effective approach to select the most relevant features to complete the identification task. Although numerous studies have used feature selection to achieve improved identification performance, no feature selection research has focused on measuring feature distributions that do not overlap or have a small overlap. First, this study proposes to extract video-related features to construct a large-scale feature set to identify video traffic. Second, to reduce the cost of video traffic identification and select an effective feature subset, the current research proposes an adaptive distribution distance-based feature selection (ADDFS) method, which uses Wasserstein distance to measure the distance between feature distributions. To test the effectiveness of the proposal, we collected a set of video traffic from different platforms in a campus network environment and conducted a set of experiments using these data sets. Experimental results suggest that the proposed method can achieve high identification performance for video scene traffic and cloud game video traffic identification. Lastly, a comparison of ADDFS with other feature selection methods shows that ADDFS is a practical feature selection technique not only for video traffic identification, but also for general classification tasks.


Rethinking Confidence Calibration for Failure Prediction

arXiv.org Artificial Intelligence

Reliable confidence estimation for the predictions is important in many safety-critical applications. However, modern deep neural networks are often overconfident for their incorrect predictions. Recently, many calibration methods have been proposed to alleviate the overconfidence problem. With calibrated confidence, a primary and practical purpose is to detect misclassification errors by filtering out low-confidence predictions (known as failure prediction). In this paper, we find a general, widely-existed but actually-neglected phenomenon that most confidence calibration methods are useless or harmful for failure prediction. We investigate this problem and reveal that popular confidence calibration methods often lead to worse confidence separation between correct and incorrect samples, making it more difficult to decide whether to trust a prediction or not. Finally, inspired by the natural connection between flat minima and confidence separation, we propose a simple hypothesis: flat minima is beneficial for failure prediction. We verify this hypothesis via extensive experiments and further boost the performance by combining two different flat minima techniques.


Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers

arXiv.org Artificial Intelligence

Developing explainability methods for Natural Language Processing (NLP) models is a challenging task, for two main reasons. First, the high dimensionality of the data (large number of tokens) results in low coverage and in turn small contributions for the top tokens, compared to the overall model performance. Second, owing to their textual nature, the input variables, after appropriate transformations, are effectively binary (presence or absence of a token in an observation), making the input-output relationship difficult to understand. Common NLP interpretation techniques do not have flexibility in resolution, because they usually operate at word-level and provide fully local (message level) or fully global (over all messages) summaries. The goal of this paper is to create more flexible model explainability summaries by segments of observation or clusters of words that are semantically related to each other. In addition, we introduce a root cause analysis method for NLP models, by analyzing representative False Positive and False Negative examples from different segments. At the end, we illustrate, using a Yelp review data set with three segments (Restaurant, Hotel, and Beauty), that exploiting group/cluster structures in words and/or messages can aid in the interpretation of decisions made by NLP models and can be utilized to assess the model's sensitivity or bias towards gender, syntax, and word meanings.