Accuracy
Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma
Chanda, Tirtha, Hauser, Katja, Hobelsberger, Sarah, Bucher, Tabea-Clara, Garcia, Carina Nogueira, Wies, Christoph, Kittler, Harald, Tschandl, Philipp, Navarrete-Dechent, Cristian, Podlipnik, Sebastian, Chousakos, Emmanouil, Crnaric, Iva, Majstorovic, Jovana, Alhajwan, Linda, Foreman, Tanya, Peternel, Sandra, Sarap, Sergei, รzdemir, ฤฐrem, Barnhill, Raymond L., Velasco, Mar Llamas, Poch, Gabriela, Korsing, Sรถren, Sondermann, Wiebke, Gellrich, Frank Friedrich, Heppt, Markus V., Erdmann, Michael, Haferkamp, Sebastian, Drexler, Konstantin, Goebeler, Matthias, Schilling, Bastian, Utikal, Jochen S., Ghoreschi, Kamran, Frรถhling, Stefan, Krieghoff-Henning, Eva, Brinker, Titus J.
Although artificial intelligence (AI) systems have been shown to improve the accuracy of initial melanoma diagnosis, the lack of transparency in how these systems identify melanoma poses severe obstacles to user acceptance. Explainable artificial intelligence (XAI) methods can help to increase transparency, but most XAI methods are unable to produce precisely located domain-specific explanations, making the explanations difficult to interpret. Moreover, the impact of XAI methods on dermatologists has not yet been evaluated. Extending on two existing classifiers, we developed an XAI system that produces text and region based explanations that are easily interpretable by dermatologists alongside its differential diagnoses of melanomas and nevi. To evaluate this system, we conducted a three-part reader study to assess its impact on clinicians' diagnostic accuracy, confidence, and trust in the XAI-support. We showed that our XAI's explanations were highly aligned with clinicians' explanations and that both the clinicians' trust in the support system and their confidence in their diagnoses were significantly increased when using our XAI compared to using a conventional AI system. The clinicians' diagnostic accuracy was numerically, albeit not significantly, increased. This work demonstrates that clinicians are willing to adopt such an XAI system, motivating their future use in the clinic.
Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows
Toussaint, Wiebke, Ding, Aaron Yi, Kawsar, Fahim, Mathur, Akhil
Billions of distributed, heterogeneous and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast and offline inference on personal data. On-device ML is highly context dependent, and sensitive to user, usage, hardware and environment attributes. This sensitivity and the propensity towards bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain, and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings we suggest low effort strategies for engineers to mitigate bias in on-device ML.
CerviFormer: A Pap-smear based cervical cancer classification method using cross attention and latent transformer
Deo, Bhaswati Singha, Pal, Mayukha, Panigarhi, Prasanta K., Pradhan, Asima
Purpose: Cervical cancer is one of the primary causes of death in women. It should be diagnosed early and treated according to the best medical advice, as with other diseases, to ensure that its effects are as minimal as possible. Pap smear images are one of the most constructive ways for identifying this type of cancer. This study proposes a cross-attention-based Transfomer approach for the reliable classification of cervical cancer in Pap smear images. Methods: In this study, we propose the CerviFormer -- a model that depends on the Transformers and thereby requires minimal architectural assumptions about the size of the input data. The model uses a cross-attention technique to repeatedly consolidate the input data into a compact latent Transformer module, which enables it to manage very large-scale inputs. We evaluated our model on two publicly available Pap smear datasets. Results: For 3-state classification on the Sipakmed data, the model achieved an accuracy of 93.70%. For 2-state classification on the Herlev data, the model achieved an accuracy of 94.57%. Conclusion: Experimental results on two publicly accessible datasets demonstrate that the proposed method achieves competitive results when compared to contemporary approaches. The proposed method brings forth a comprehensive classification model to detect cervical cancer in Pap smear images. This may aid medical professionals in providing better cervical cancer treatment, consequently, enhancing the overall effectiveness of the entire testing process.
Towards the Scalable Evaluation of Cooperativeness in Language Models
Chan, Alan, Richรฉ, Maxime, Clifton, Jesse
It is likely that AI systems driven by pre-trained language models (PLMs) will increasingly be used to assist humans in high-stakes interactions with other agents, such as negotiation or conflict resolution. Consistent with the goals of Cooperative AI \citep{dafoe_open_2020}, we wish to understand and shape the multi-agent behaviors of PLMs in a pro-social manner. An important first step is the evaluation of model behaviour across diverse cooperation problems. Since desired behaviour in an interaction depends upon precise game-theoretic structure, we focus on generating scenarios with particular structures with both crowdworkers and a language model. Our work proceeds as follows. First, we discuss key methodological issues in the generation of scenarios corresponding to particular game-theoretic structures. Second, we employ both crowdworkers and a language model to generate such scenarios. We find that the quality of generations tends to be mediocre in both cases. We additionally get both crowdworkers and a language model to judge whether given scenarios align with their intended game-theoretic structure, finding mixed results depending on the game. Third, we provide a dataset of scenario based on our data generated. We provide both quantitative and qualitative evaluations of UnifiedQA and GPT-3 on this dataset. We find that instruct-tuned models tend to act in a way that could be perceived as cooperative when scaled up, while other models seemed to have flat scaling trends.
Deep Metric Learning for Unsupervised Remote Sensing Change Detection
Bandara, Wele Gedara Chaminda, Patel, Vishal M.
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs), which aids in various RS applications such as land cover, land use, human development analysis, and disaster response. The performance of existing RS-CD methods is attributed to training on large annotated datasets. Furthermore, most of these models are less transferable in the sense that the trained model often performs very poorly when there is a domain gap between training and test datasets. This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues. Given an MT-RSI, the proposed method generates corresponding change probability map by iteratively optimizing an unsupervised CD loss without training it on a large dataset. Our unsupervised CD method consists of two interconnected deep networks, namely Deep-Change Probability Generator (D-CPG) and Deep-Feature Extractor (D-FE). The D-CPG is designed to predict change and no change probability maps for a given MT-RSI, while D-FE is used to extract deep features of MT-RSI that will be further used in the proposed unsupervised CD loss. We use transfer learning capability to initialize the parameters of D-FE. We iteratively optimize the parameters of D-CPG and D-FE for a given MT-RSI by minimizing the proposed unsupervised ``similarity-dissimilarity loss''. This loss is motivated by the principle of metric learning where we simultaneously maximize the distance between change pair-wise pixels while minimizing the distance between no-change pair-wise pixels in bi-temporal image domain and their deep feature domain. The experiments conducted on three CD datasets show that our unsupervised CD method achieves significant improvements over the state-of-the-art supervised and unsupervised CD methods. Code available at https://github.com/wgcban/Metric-CD
Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence
Rivals, Isabelle, Sautier, Cรฉdric, Cognon, Guillaume, Cuzuel, Vincent
The issue of distinguishing between the same-source and different-source hypotheses based on various types of traces is a generic problem in forensic science. This problem is often tackled with Bayesian approaches, which are able to provide a likelihood ratio that quantifies the relative strengths of evidence supporting each of the two competing hypotheses. Here, we focus on distance-based approaches, whose robustness and specifically whose capacity to deal with high-dimensional evidence are very different, and need to be evaluated and optimized. A unified framework for direct methods based on estimating the likelihoods of the distance between traces under each of the two competing hypotheses, and indirect methods using logistic regression to discriminate between same-source and different-source distance distributions, is presented. Whilst direct methods are more flexible, indirect methods are more robust and quite natural in machine learning. Moreover, indirect methods also enable the use of a vectorial distance, thus preventing the severe information loss suffered by scalar distance approaches.Direct and indirect methods are compared in terms of sensitivity, specificity and robustness, with and without dimensionality reduction, with and without feature selection, on the example of hand odor profiles, a novel and challenging type of evidence in the field of forensics. Empirical evaluations on a large panel of 534 subjects and their 1690 odor traces show the significant superiority of the indirect methods, especially without dimensionality reduction, be it with or without feature selection.
CELEST: Federated Learning for Globally Coordinated Threat Detection
Ongun, Talha, Boboila, Simona, Oprea, Alina, Eliassi-Rad, Tina, Hiser, Jason, Davidson, Jack
The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection, a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We also design a poisoning detection and mitigation method, DTrust, specifically designed for federated learning in the collaborative threat detection domain. DTrust successfully detects poisoning clients using the feedback from participating clients to investigate and remove them from the training process. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.
Predicting discrete-time bifurcations with deep learning
Bury, Thomas M., Dylewsky, Daniel, Bauch, Chris T., Anand, Madhur, Glass, Leon, Shrier, Alvin, Bub, Gil
Many natural and man-made systems are prone to critical transitions -- abrupt and potentially devastating changes in dynamics. Deep learning classifiers can provide an early warning signal (EWS) for critical transitions by learning generic features of bifurcations (dynamical instabilities) from large simulated training data sets. So far, classifiers have only been trained to predict continuous-time bifurcations, ignoring rich dynamics unique to discrete-time bifurcations. Here, we train a deep learning classifier to provide an EWS for the five local discrete-time bifurcations of codimension-1. We test the classifier on simulation data from discrete-time models used in physiology, economics and ecology, as well as experimental data of spontaneously beating chick-heart aggregates that undergo a period-doubling bifurcation. The classifier outperforms commonly used EWS under a wide range of noise intensities and rates of approach to the bifurcation. It also predicts the correct bifurcation in most cases, with particularly high accuracy for the period-doubling, Neimark-Sacker and fold bifurcations. Deep learning as a tool for bifurcation prediction is still in its nascence and has the potential to transform the way we monitor systems for critical transitions.
Graph Neural Network Surrogates of Fair Graph Filtering
Krasanakis, Emmanouil, Papadopoulos, Symeon
Graph filters that transform prior node values to posterior scores via edge propagation often support graph mining tasks affecting humans, such as recommendation and ranking. Thus, it is important to make them fair in terms of satisfying statistical parity constraints between groups of nodes (e.g., distribute score mass between genders proportionally to their representation). To achieve this while minimally perturbing the original posteriors, we introduce a filter-aware universal approximation framework for posterior objectives. This defines appropriate graph neural networks trained at runtime to be similar to filters but also locally optimize a large class of objectives, including fairness-aware ones. Experiments on a collection of 8 filters and 5 graphs show that our approach performs equally well or better than alternatives in meeting parity constraints while preserving the AUC of score-based community member recommendation and creating minimal utility loss in prior diffusion.
Anomaly Search Over Many Sequences With Switching Costs
Ubl, Matthew, Robinson, Benjamin D., Hale, Matthew T.
This paper considers the quickest search problem to identify anomalies among large numbers of data streams. These streams can model, for example, disjoint regions monitored by a mobile robot. A particular challenge is a version of the problem in which the experimenter must suffer a cost each time the data stream being sampled changes, such as the time the robot must spend moving between regions. In this paper, we propose an algorithm which accounts for switching costs by varying a confidence threshold that governs when the algorithm switches to a new data stream. Our main contributions are easily computable approximations for both the optimal value of this threshold and the optimal value of the parameter that determines when a stream must be re-sampled. Further, we empirically show (i) a uniform improvement for switching costs of interest and (ii) roughly equivalent performance for small switching costs when comparing to the closest available algorithm.