Goto

Collaborating Authors

 Accuracy


Designing Equitable Algorithms

arXiv.org Artificial Intelligence

Predictive algorithms are now used to help distribute a large share of our society's resources and sanctions, such as healthcare, loans, criminal detentions, and tax audits. Under the right circumstances, these algorithms can improve the efficiency and equity of decision-making. At the same time, there is a danger that the algorithms themselves could entrench and exacerbate disparities, particularly along racial, ethnic, and gender lines. To help ensure their fairness, many researchers suggest that algorithms be subject to at least one of three constraints: (1) no use of legally protected features, such as race, ethnicity, and gender; (2) equal rates of "positive" decisions across groups; and (3) equal error rates across groups. Here we show that these constraints, while intuitively appealing, often worsen outcomes for individuals in marginalized groups, and can even leave all groups worse off. The inherent trade-off we identify between formal fairness constraints and welfare improvements -- particularly for the marginalized -- highlights the need for a more robust discussion on what it means for an algorithm to be "fair". We illustrate these ideas with examples from healthcare and the criminal-legal system, and make several proposals to help practitioners design more equitable algorithms.


MEDFAIR: Benchmarking Fairness for Medical Imaging

arXiv.org Artificial Intelligence

A multitude of work has shown that machine learning-based medical diagnosis systems can be biased against certain subgroups of people. This has motivated a growing number of bias mitigation algorithms that aim to address fairness issues in machine learning. However, it is difficult to compare their effectiveness in medical imaging for two reasons. First, there is little consensus on the criteria to assess fairness. Second, existing bias mitigation algorithms are developed under different settings, e.g., datasets, model selection strategies, backbones, and fairness metrics, making a direct comparison and evaluation based on existing results impossible. In this work, we introduce MEDFAIR, a framework to benchmark the fairness of machine learning models for medical imaging. MEDFAIR covers eleven algorithms from various categories, nine datasets from different imaging modalities, and three model selection criteria. Through extensive experiments, we find that the under-studied issue of model selection criterion can have a significant impact on fairness outcomes; while in contrast, state-of-the-art bias mitigation algorithms do not significantly improve fairness outcomes over empirical risk minimization (ERM) in both in-distribution and out-of-distribution settings. We evaluate fairness from various perspectives and make recommendations for different medical application scenarios that require different ethical principles. Our framework provides a reproducible and easy-to-use entry point for the development and evaluation of future bias mitigation algorithms in deep learning. Code is available at https://github.com/ys-zong/MEDFAIR.


Partial AUC Scores: A Better Metric for Binary Classification

#artificialintelligence

Partial AUC (Area Under the Curve) scores are a valuable tool for evaluating the performance of binary classification models, particularly when the class distribution is highly imbalanced. Unlike traditional AUC scores, partial AUC scores concentrate on a specific region of the ROC (Receiver Operating Characteristic) curve, offering a more detailed evaluation of the model's performance. This blog post will dive into what partial AUC scores are, how they are calculated, and why they are essential for evaluating imbalanced datasets. We will also include relevant examples and a code example using Python to help make these concepts clearer. This article was published as a part of the Data Science Blogathon.


Foundation Models for Natural Language Processing -- Pre-trained Language Models Integrating Media

arXiv.org Artificial Intelligence

This open access book provides a comprehensive overview of the state of the art in research and applications of Foundation Models and is intended for readers familiar with basic Natural Language Processing (NLP) concepts. Over the recent years, a revolutionary new paradigm has been developed for training models for NLP. These models are first pre-trained on large collections of text documents to acquire general syntactic knowledge and semantic information. Then, they are fine-tuned for specific tasks, which they can often solve with superhuman accuracy. When the models are large enough, they can be instructed by prompts to solve new tasks without any fine-tuning. Moreover, they can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. Because they provide a blueprint for solving many tasks in artificial intelligence, they have been called Foundation Models. After a brief introduction to basic NLP models the main pre-trained language models BERT, GPT and sequence-to-sequence transformer are described, as well as the concepts of self-attention and context-sensitive embedding. Then, different approaches to improving these models are discussed, such as expanding the pre-training criteria, increasing the length of input texts, or including extra knowledge. An overview of the best-performing models for about twenty application areas is then presented, e.g., question answering, translation, story generation, dialog systems, generating images from text, etc. For each application area, the strengths and weaknesses of current models are discussed, and an outlook on further developments is given. In addition, links are provided to freely available program code. A concluding chapter summarizes the economic opportunities, mitigation of risks, and potential developments of AI.


A method for incremental discovery of financial event types based on anomaly detection

arXiv.org Artificial Intelligence

Event datasets in the financial domain are often constructed based on actual application scenarios, and their event types are weakly reusable due to scenario constraints; at the same time, the massive and diverse new financial big data cannot be limited to the event types defined for specific scenarios. This limitation of a small number of event types does not meet our research needs for more complex tasks such as the prediction of major financial events and the analysis of the ripple effects of financial events. In this paper, a three-stage approach is proposed to accomplish incremental discovery of event types. For an existing annotated financial event dataset, the three-stage approach consists of: for a set of financial event data with a mixture of original and unknown event types, a semi-supervised deep clustering model with anomaly detection is first applied to classify the data into normal and abnormal events, where abnormal events are events that do not belong to known types; then normal events are tagged with appropriate event types and abnormal events are reasonably clustered. Finally, a cluster keyword extraction method is used to recommend the type names of events for the new event clusters, thus incrementally discovering new event types. The proposed method is effective in the incremental discovery of new event types on real data sets.


Ultra-marginal Feature Importance: Learning from Data with Causal Guarantees

arXiv.org Artificial Intelligence

Recently, feature importance methods such as Shapley values (Shapley, 1953; Cohen et al., 2007; Lundberg and Lee, 2017), Shapley additive global importance (SAGE) (Covert Scientists frequently prioritize learning from data et al., 2020), accumulated local effects (ALE) (Apley and rather than training the best possible model; however, Zhu, 2020), permutation importance (PI) (Breiman, 2001), research in machine learning often prioritizes and conditional permutation importance (CPI) (Debeer and the latter. Marginal contribution feature importance Strobl, 2020), have been used in high-impact journal papers (MCI) was developed to break this trend by scientists who want to explain the mechanisms behind by providing a useful framework for quantifying observational data (Addor et al., 2018; Bazaga et al., 2020; the relationships in data. In this work, we aim to Stein et al., 2021; Johnsen et al., 2021; Schmidt et al., 2020; improve upon the theoretical properties, performance, Gill et al., 2017; Janssen et al., 2022). However, these and runtime of MCI by introducing ultramarginal methods are predominantly for model explanation or feature feature importance (UMFI), which uses selection, so they have many shortcomings when used dependence removal techniques from the AI fairness for other purposes such as scientific inference (Freiesleben literature as its foundation. We first propose et al., 2022; Catav et al., 2021). ALE can nicely display axioms for feature importance methods that how changes in inputs lead to altered model predictions but seek to explain the causal and associative relationships important higher order effects are omitted (Molnar, 2020), in data, and we prove that UMFI satisfies and although CPI improves upon some limitations of PI, these axioms under basic assumptions. We CPI gives zero importance to perfectly correlated features then show on real and simulated data that UMFI even if they offer significant explanatory power towards performs better than MCI, especially in the presence the response (Covert et al., 2020). Similarly, Shapley values of correlated interactions and unrelated features, diminish the importance of duplicated or highly correlated while partially learning the structure of the features (Catav et al., 2021). Further, only one model causal graph and reducing the exponential runtime is trained in ALE, CPI, and PI.


Omnipredictors for Constrained Optimization

arXiv.org Artificial Intelligence

The notion of omnipredictors (Gopalan, Kalai, Reingold, Sharan and Wieder ITCS 2021), suggested a new paradigm for loss minimization. Rather than learning a predictor based on a known loss function, omnipredictors can easily be post-processed to minimize any one of a rich family of loss functions compared with the loss of hypotheses in a class $\mathcal C$. It has been shown that such omnipredictors exist and are implied (for all convex and Lipschitz loss functions) by the notion of multicalibration from the algorithmic fairness literature. In this paper, we introduce omnipredictors for constrained optimization and study their complexity and implications. The notion that we introduce allows the learner to be unaware of the loss function that will be later assigned as well as the constraints that will be later imposed, as long as the subpopulations that are used to define these constraints are known. We show how to obtain omnipredictors for constrained optimization problems, relying on appropriate variants of multicalibration. We also investigate the implications of this notion when the constraints used are so-called group fairness notions.


ARGUS: Context-Based Detection of Stealthy IoT Infiltration Attacks

arXiv.org Artificial Intelligence

IoT application domains, device diversity and connectivity are rapidly growing. IoT devices control various functions in smart homes and buildings, smart cities, and smart factories, making these devices an attractive target for attackers. On the other hand, the large variability of different application scenarios and inherent heterogeneity of devices make it very challenging to reliably detect abnormal IoT device behaviors and distinguish these from benign behaviors. Existing approaches for detecting attacks are mostly limited to attacks directly compromising individual IoT devices, or, require predefined detection policies. They cannot detect attacks that utilize the control plane of the IoT system to trigger actions in an unintended/malicious context, e.g., opening a smart lock while the smart home residents are absent. In this paper, we tackle this problem and propose ARGUS, the first self-learning intrusion detection system for detecting contextual attacks on IoT environments, in which the attacker maliciously invokes IoT device actions to reach its goals. ARGUS monitors the contextual setting based on the state and actions of IoT devices in the environment. An unsupervised Deep Neural Network (DNN) is used for modeling the typical contextual device behavior and detecting actions taking place in abnormal contextual settings. This unsupervised approach ensures that ARGUS is not restricted to detecting previously known attacks but is also able to detect new attacks. We evaluated ARGUS on heterogeneous real-world smart-home settings and achieve at least an F1-Score of 99.64% for each setup, with a false positive rate (FPR) of at most 0.03%.


Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking

arXiv.org Artificial Intelligence

The increasing use of Machine Learning (ML) software can lead to unfair and unethical decisions, thus fairness bugs in software are becoming a growing concern. Addressing these fairness bugs often involves sacrificing ML performance, such as accuracy. To address this issue, we present a novel counterfactual approach that uses counterfactual thinking to tackle the root causes of bias in ML software. In addition, our approach combines models optimized for both performance and fairness, resulting in an optimal solution in both aspects. We conducted a thorough evaluation of our approach on 10 benchmark tasks using a combination of 5 performance metrics, 3 fairness metrics, and 15 measurement scenarios, all applied to 8 real-world datasets. The conducted extensive evaluations show that the proposed method significantly improves the fairness of ML software while maintaining competitive performance, outperforming state-of-the-art solutions in 84.6% of overall cases based on a recent benchmarking tool.


Choosing the Number of Topics in LDA Models -- A Monte Carlo Comparison of Selection Criteria

arXiv.org Artificial Intelligence

Selecting the number of topics in LDA models is considered to be a difficult task, for which alternative approaches have been proposed. The performance of the recently developed singular Bayesian information criterion (sBIC) is evaluated and compared to the performance of alternative model selection criteria. The sBIC is a generalization of the standard BIC that can be implemented to singular statistical models. The comparison is based on Monte Carlo simulations and carried out for several alternative settings, varying with respect to the number of topics, the number of documents and the size of documents in the corpora. Performance is measured using different criteria which take into account the correct number of topics, but also whether the relevant topics from the DGPs are identified. Practical recommendations for LDA model selection in applications are derived.