Accuracy
Anomaly Detection via Gumbel Noise Score Matching
Mahmood, Ahsan, Oliva, Junier, Styner, Martin
We propose Gumbel Noise Score Matching (GNSM), a novel unsupervised method to detect anomalies in categorical data. GNSM accomplishes this by estimating the scores, i.e. the gradients of log likelihoods w.r.t.~inputs, of continuously relaxed categorical distributions. We test our method on a suite of anomaly detection tabular datasets. GNSM achieves a consistently high performance across all experiments. We further demonstrate the flexibility of GNSM by applying it to image data where the model is tasked to detect poor segmentation predictions. Images ranked anomalous by GNSM show clear segmentation failures, with the outputs of GNSM strongly correlating with segmentation metrics computed on ground-truth. We outline the score matching training objective utilized by GNSM and provide an open-source implementation of our work.
Protecting User Privacy in Online Settings via Supervised Learning
Rusescu, Alexandru, Lampe, Brooke, Meng, Weizhi
Companies that have an online presence-in particular, companies that are exclusively digital-often subscribe to this business model: collect data from the user base, then expose the data to advertisement agencies in order to turn a profit. Such companies routinely market a service as "free", while obfuscating the fact that they tend to "charge" users in the currency of personal information rather than money. However, online companies also gather user data for more principled purposes, such as improving the user experience and aggregating statistics. The problem is the sale of user data to third parties. In this work, we design an intelligent approach to online privacy protection that leverages supervised learning. By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user. In our evaluation, we collect a dataset of network requests and measure the performance of several classifiers that adhere to the supervised learning paradigm. The results of our evaluation demonstrate the feasibility and potential of our approach.
Adaptable and Interpretable Framework for Novelty Detection in Real-Time IoT Systems
Wadinger, Marek, Kvasnica, Michal
This paper presents the Real-time Adaptive and Interpretable Detection (RAID) algorithm. The novel approach addresses the limitations of state-of-the-art anomaly detection methods for multivariate dynamic processes, which are restricted to detecting anomalies within the scope of the model training conditions. The RAID algorithm adapts to non-stationary effects such as data drift and change points that may not be accounted for during model development, resulting in prolonged service life. A dynamic model based on joint probability distribution handles anomalous behavior detection in a system and the root cause isolation based on adaptive process limits. RAID algorithm does not require changes to existing process automation infrastructures, making it highly deployable across different domains. Two case studies involving real dynamic system data demonstrate the benefits of the RAID algorithm, including change point adaptation, root cause isolation, and improved detection accuracy.
Quantifying and Defending against Privacy Threats on Federated Knowledge Graph Embedding
Hu, Yuke, Liang, Wei, Wu, Ruofan, Xiao, Kai, Wang, Weiqiang, Li, Xiaochen, Liu, Jinfei, Qin, Zhan
Knowledge Graph Embedding (KGE) is a fundamental technique that extracts expressive representation from knowledge graph (KG) to facilitate diverse downstream tasks. The emerging federated KGE (FKGE) collaboratively trains from distributed KGs held among clients while avoiding exchanging clients' sensitive raw KGs, which can still suffer from privacy threats as evidenced in other federated model trainings (e.g., neural networks). However, quantifying and defending against such privacy threats remain unexplored for FKGE which possesses unique properties not shared by previously studied models. In this paper, we conduct the first holistic study of the privacy threat on FKGE from both attack and defense perspectives. For the attack, we quantify the privacy threat by proposing three new inference attacks, which reveal substantial privacy risk by successfully inferring the existence of the KG triple from victim clients. For the defense, we propose DP-Flames, a novel differentially private FKGE with private selection, which offers a better privacy-utility tradeoff by exploiting the entity-binding sparse gradient property of FKGE and comes with a tight privacy accountant by incorporating the state-of-the-art private selection technique. We further propose an adaptive privacy budget allocation policy to dynamically adjust defense magnitude across the training procedure. Comprehensive evaluations demonstrate that the proposed defense can successfully mitigate the privacy threat by effectively reducing the success rate of inference attacks from $83.1\%$ to $59.4\%$ on average with only a modest utility decrease.
ECG Feature Importance Rankings: Cardiologists vs. Algorithms
Mehari, Temesgen, Sundar, Ashish, Bosnjakovic, Alen, Harris, Peter, Williams, Steven E., Loewe, Axel, Doessel, Olaf, Nagel, Claudia, Strodthoff, Nils, Aston, Philip J.
On the other hand, it is quite conceivable that a simple diagnoses are made on the basis of a multitude of ECG binary classification of healthy vs. a specific pathology could features which consist mainly of time intervals between certain be successfully achieved by using only a reduced subset of the fiducial points on the ECG, amplitudes of prominent features complete list of diagnostic conditions. However, we consider or morphology of ECG segments. For each pathology, the it appropriate to study the simplest case first. A study of relevant criteria for specific features are well documented [1], multiclass feature importance algorithms with all four of the [2], although there may be minor differences between one above classes has been undertaken as a separate study [4].
Exploring the Utility of Self-Supervised Pretraining Strategies for the Detection of Absent Lung Sliding in M-Mode Lung Ultrasound
VanBerlo, Blake, Li, Brian, Wong, Alexander, Hoey, Jesse, Arntfield, Robert
Self-supervised pretraining has been observed to improve performance in supervised learning tasks in medical imaging. This study investigates the utility of self-supervised pretraining prior to conducting supervised fine-tuning for the downstream task of lung sliding classification in M-mode lung ultrasound images. We propose a novel pairwise relationship that couples M-mode images constructed from the same B-mode image and investigate the utility of data augmentation procedure specific to M-mode lung ultrasound. The results indicate that self-supervised pretraining yields better performance than full supervision, most notably for feature extractors not initialized with ImageNet-pretrained weights. Moreover, we observe that including a vast volume of unlabelled data results in improved performance on external validation datasets, underscoring the value of self-supervision for improving generalizability in automatic ultrasound interpretation. To the authors' best knowledge, this study is the first to characterize the influence of self-supervised pretraining for M-mode ultrasound.
Neural Bandits for Data Mining: Searching for Dangerous Polypharmacy
Larouche, Alexandre, Durand, Audrey, Khoury, Richard, Sirois, Caroline
Polypharmacy, most often defined as the simultaneous consumption of five or more drugs at once, is a prevalent phenomenon in the older population. Some of these polypharmacies, deemed inappropriate, may be associated with adverse health outcomes such as death or hospitalization. Considering the combinatorial nature of the problem as well as the size of claims database and the cost to compute an exact association measure for a given drug combination, it is impossible to investigate every possible combination of drugs. Therefore, we propose to optimize the search for potentially inappropriate polypharmacies (PIPs). To this end, we propose the OptimNeuralTS strategy, based on Neural Thompson Sampling and differential evolution, to efficiently mine claims datasets and build a predictive model of the association between drug combinations and health outcomes. We benchmark our method using two datasets generated by an internally developed simulator of polypharmacy data containing 500 drugs and 100 000 distinct combinations. Empirically, our method can detect up to 72% of PIPs while maintaining an average precision score of 99% using 30 000 time steps.
A Neural Network Approach for Selecting Track-like Events in Fluorescence Telescope Data
Zotov, Mikhail, Sokolinskii, Denis
In recent years, neural networks of various configurations have been increasingly used to analyze data obtained with fluorescent and Cherenkov telescopes. In particular, a whole series of studies dedicated to the analysis of gamma-ray astronomy data with neural networks has been performed by the VERITAS [1], TAIGA [2, 3], and CTA [4, 5] collaborations. Typical tasks are the recognition of particular signal patterns in the data flow. In the simplest case, the problem can be reduced to classifying data into two groups: data samples that contain a signal of the desired type and all the rest. Since data obtained with the help of telescopes can naturally be considered as images or animations, one of the popular tools for classifying them are convolutional neural networks (CNNs), created primarily for image classification. CNNs have demonstrated the highest efficiency in this class of problems, see, for example, [6, 7].
Variable-Based Calibration for Machine Learning Classifiers
Kelly, Markelle, Smyth, Padhraic
The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant miscalibration as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based calibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability.
Toxicity in Multilingual Machine Translation at Scale
Costa-jussà, Marta R., Smith, Eric, Ropers, Christophe, Licht, Daniel, Maillard, Jean, Ferrando, Javier, Escolano, Carlos
Machine Translation systems can produce different types of errors, some of which are characterized as critical or catastrophic due to the specific negative impact that they can have on users. In this paper we focus on one type of critical error: added toxicity. We evaluate and analyze added toxicity when translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences, covering 13 demographic axes) from English into 164 languages. An automatic toxicity evaluation shows that added toxicity across languages varies from 0% to 5%. The output languages with the most added toxicity tend to be low-resource ones, and the demographic axes with the most added toxicity include sexual orientation, gender and sex, and ability. We also perform human evaluation on a subset of 8 translation directions, confirming the prevalence of true added toxicity. We use a measurement of the amount of source contribution to the translation, where a low source contribution implies hallucination, to interpret what causes toxicity. Making use of the input attributions allows us to explain toxicity, because the source contributions significantly correlate with toxicity for 84% of languages studied. Given our findings, our recommendations to reduce added toxicity are to curate training data to avoid mistranslations, mitigate hallucination and check unstable translations.