Goto

Collaborating Authors

 Accuracy


Uncertainty-Estimation with Normalized Logits for Out-of-Distribution Detection

arXiv.org Artificial Intelligence

Out-of-distribution (OOD) detection is critical for preventing deep learning models from making incorrect predictions to ensure the safety of artificial intelligence systems. Especially in safety-critical applications such as medical diagnosis and autonomous driving, the cost of incorrect decisions is usually unbearable. However, neural networks often suffer from the overconfidence issue, making high confidence for OOD data which are never seen during training process and may be irrelevant to training data, namely in-distribution (ID) data. Determining the reliability of the prediction is still a difficult and challenging task. In this work, we propose Uncertainty-Estimation with Normalized Logits (UE-NL), a robust learning method for OOD detection, which has three main benefits. (1) Neural networks with UE-NL treat every ID sample equally by predicting the uncertainty score of input data and the uncertainty is added into softmax function to adjust the learning strength of easy and hard samples during training phase, making the model learn robustly and accurately. (2) UE-NL enforces a constant vector norm on the logits to decouple the effect of the increasing output norm from optimization process, which causes the overconfidence issue to some extent. (3) UE-NL provides a new metric, the magnitude of uncertainty score, to detect OOD data. Experiments demonstrate that UE-NL achieves top performance on common OOD benchmarks and is more robust to noisy ID data that may be misjudged as OOD data by other methods.


That Escalated Quickly: An ML Framework for Alert Prioritization

arXiv.org Artificial Intelligence

In place of in-house solutions, organizations are increasingly moving towards managed services for cyber defense. Security Operations Centers are specialized cybersecurity units responsible for the defense of an organization, but the large-scale centralization of threat detection is causing SOCs to endure an overwhelming amount of false positive alerts -- a phenomenon known as alert fatigue. Large collections of imprecise sensors, an inability to adapt to known false positives, evolution of the threat landscape, and inefficient use of analyst time all contribute to the alert fatigue problem. To combat these issues, we present That Escalated Quickly (TEQ), a machine learning framework that reduces alert fatigue with minimal changes to SOC workflows by predicting alert-level and incident-level actionability. On real-world data, the system is able to reduce the time it takes to respond to actionable incidents by $22.9\%$, suppress $54\%$ of false positives with a $95.1\%$ detection rate, and reduce the number of alerts an analyst needs to investigate within singular incidents by $14\%$.


Interpretable Boosted Decision Tree Analysis for the Majorana Demonstrator

arXiv.org Artificial Intelligence

The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logic, allowing us to learn from the machine to feedback to the traditional analysis. In this work, we have presented the first machine learning analysis of the data from the Majorana Demonstrator; this is also the first interpretable machine learning analysis of any germanium detector experiment. Two gradient boosted decision tree models are trained to learn from the data, and a game-theory-based model interpretability study is conducted to understand the origin of the classification power. By learning from data, this analysis recognizes the correlations among reconstruction parameters to further enhance the background rejection performance. By learning from the machine, this analysis reveals the importance of new background categories to reciprocally benefit the standard Majorana analysis. This model is highly compatible with next-generation germanium detector experiments like LEGEND since it can be simultaneously trained on a large number of detectors.


Tight Auditing of Differentially Private Machine Learning

arXiv.org Artificial Intelligence

Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implausible worst-case assumptions (e.g., a fully adversarial dataset). Second, they require thousands or millions of training runs to produce non-trivial statistical estimates of the privacy leakage. This work addresses both issues. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets -- if the adversary can see all model updates during training. Prior auditing works rely on the same assumption, which is permitted under the standard differential privacy threat model. This threat model is also applicable, e.g., in federated learning settings. Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy. We demonstrate the utility of our improved auditing schemes by surfacing implementation bugs in private machine learning code that eluded prior auditing techniques.


A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

arXiv.org Artificial Intelligence

Crowdsourcing is a popular method used to estimate ground-truth labels by collecting noisy labels from workers. In this work, we are motivated by crowdsourcing applications where each worker can exhibit two levels of accuracy depending on a task's type. Applying algorithms designed for the traditional Dawid-Skene model to such a scenario results in performance which is limited by the hard tasks. Therefore, we first extend the model to allow worker accuracy to vary depending on a task's unknown type. Then we propose a spectral method to partition tasks by type. After separating tasks by type, any Dawid-Skene algorithm (i.e., any algorithm designed for the Dawid-Skene model) can be applied independently to each type to infer the truth values. We theoretically prove that when crowdsourced data contain tasks with varying levels of difficulty, our algorithm infers the true labels with higher accuracy than any Dawid-Skene algorithm. Experiments show that our method is effective in practical applications.


When Mitigating Bias is Unfair: A Comprehensive Study on the Impact of Bias Mitigation Algorithms

arXiv.org Artificial Intelligence

Most works on the fairness of machine learning systems focus on the blind optimization of common fairness metrics, such as Demographic Parity and Equalized Odds. In this paper, we conduct a comparative study of several bias mitigation approaches to investigate their behaviors at a fine grain, the prediction level. Our objective is to characterize the differences between fair models obtained with different approaches. With comparable performances in fairness and accuracy, are the different bias mitigation approaches impacting a similar number of individuals? Do they mitigate bias in a similar way? Do they affect the same individuals when debiasing a model? Our findings show that bias mitigation approaches differ a lot in their strategies, both in the number of impacted individuals and the populations targeted. More surprisingly, we show these results even apply for several runs of the same mitigation approach. These findings raise questions about the limitations of the current group fairness metrics, as well as the arbitrariness, hence unfairness, of the whole debiasing process.


A Review of the Role of Causality in Developing Trustworthy AI Systems

arXiv.org Artificial Intelligence

As a result, they are often brittle and unable to adapt to new domains, can treat individuals or subgroups unfairly, and have limited ability to explain their actions or recommendations [197, 235] reducing the trust of human users [118]. Following this, a new area of research, trustworthy AI, has recently received much attention from several policymakers and other regulatory organizations. The resulting guidelines (e.g., [184, 186, 187]), introduced to increase trust in AI systems, make developing trustworthy AI not only a technical (research) and social endeavor but also an organizational and (legal) obligational requirement. In this paper, we set out to demonstrate, through an extensive survey, that causal modeling and reasoning is an emerging and very useful tool for enabling current AI systems to become trustworthy. Causality is the science of reasoning about causes and effects. Cause-and-effect relationships are central to how we make sense of the world around us, how we act upon it, and how we respond to changes in our environment. In AI, research in causality was pioneered by the Turing award winner Judea Pearl long back in his 1995 seminal paper [194]. Since then, many researchers have contributed to the development of a solid mathematical basis for causality; see, for example, the books [79, 196, 201], the survey [90] and seminal papers [197, 235].


A model-free feature selection technique of feature screening and random forest based recursive feature elimination

arXiv.org Artificial Intelligence

Due to the development of data technology, feature selection plays an important role in both statistics and machine learning. High dimensional and ultra-high dimensional datasets are widely used in many fields, such as finance, image recognition, text classification, etc. Although more detailed information is provided with the increase of dimensions, the existence of a large number of redundant features weakens the generalization ability of models and increases the difficulty of data analysis (Jain et al., 2000). Thus, the efficiency of feature selection is crucial, as it focuses on choosing a small subset of informative features that contain the information of data and determine the source of the specific concerns derived from the study. In many data analyses, feature selection is a significant and frequently used dimensionality reduction technique and is often considered a key preprocessing step in data analysis for model benefits, such as interpretability, accuracy, lower computational costs, and less prone to overfitting (Sheikhpour et al., 2017; Khaire and Dhanalakshmi, 2022).


Isotopic envelope identification by analysis of the spatial distribution of components in MALDI-MSI data

arXiv.org Artificial Intelligence

One of the significant steps in the process leading to the identification of proteins is mass spectrometry, which allows for obtaining information about the structure of proteins. Removing isotope peaks from the mass spectrum is vital and it is done in a process called deisotoping. There are different algorithms for deisotoping, but they have their limitations, they are dedicated to different methods of mass spectrometry. Data from experiments performed with the MALDI-ToF technique are characterized by high dimensionality. This paper presents a method for identifying isotope envelopes in MALDI-ToF molecular imaging data based on the Mamdani-Assilan fuzzy system and spatial maps of the molecular distribution of peaks included in the isotopic envelope. Several image texture measures were used to evaluate spatial molecular distribution maps. The algorithm was tested on eight datasets obtained from the MALDI-ToF experiment on samples from the National Institute of Oncology in Gliwice from patients with cancer of the head and neck region. The data were subjected to pre-processing and feature extraction. The results were collected and compared with three existing deisotoping algorithms. The analysis of the obtained results showed that the method for identifying isotopic envelopes proposed in this paper enables the detection of overlapping envelopes by using the approach oriented to study peak pairs. Moreover, the proposed algorithm enables the analysis of large data sets.


Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

arXiv.org Artificial Intelligence

Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy Figure 1: Overview of the proposed approaches and the on five benchmark NLP tasks as well as baseline (BASE). We propose two new data augmentation improving AUC scores on four bias detection strategies, MEDA and INST. The text in purple are tasks by 1.3%. We also demonstrate the generalizability control variables indicating the desired toxicity level of of our techniques by scaling the the text. The text in black is the input to the model number of training samples and the number of and the text in green is the generated output using each model parameters.