Goto

Collaborating Authors

 Accuracy


An Examination of Fairness of AI Models for Deepfake Detection

arXiv.org Artificial Intelligence

Recent studies have demonstrated that deep learning models can discriminate based on protected classes like race and gender. In this work, we evaluate bias present in deepfake datasets and detection models across protected subgroups. Using facial datasets balanced by race and gender, we examine three popular deepfake detectors and find large disparities in predictive performances across races, with up to 10.7% difference in error rate between subgroups. A closer look reveals that the widely used FaceForensics++ dataset is overwhelmingly composed of Caucasian subjects, with the majority being female Caucasians. Our investigation of the racial distribution of deepfakes reveals that the methods used to create deepfakes as positive training signals tend to produce "irregular" faces - when a person's face is swapped onto another person of a different race or gender. This causes detectors to learn spurious correlations between the foreground faces and fakeness. Moreover, when detectors are trained with the Blended Image (BI) dataset from Face X-Rays, we find that those detectors develop systematic discrimination towards certain racial subgroups, primarily female Asians.


Anatomy-Guided Parallel Bottleneck Transformer Network for Automated Evaluation of Root Canal Therapy

arXiv.org Artificial Intelligence

Objective: Accurate evaluation of the root canal filling result in X-ray image is a significant step for the root canal therapy, which is based on the relative position between the apical area boundary of tooth root and the top of filled gutta-percha in root canal as well as the shape of the tooth root and so on to classify the result as correct-filling, under-filling or over-filling. Methods: We propose a novel anatomy-guided Transformer diagnosis network. For obtaining accurate anatomy-guided features, a polynomial curve fitting segmentation is proposed to segment the fuzzy boundary. And a Parallel Bottleneck Transformer network (PBT-Net) is introduced as the classification network for the final evaluation. Results, and conclusion: Our numerical experiments show that our anatomy-guided PBT-Net improves the accuracy from 40\% to 85\% relative to the baseline classification network. Comparing with the SOTA segmentation network indicates that the ASD is significantly reduced by 30.3\% through our fitting segmentation. Significance: Polynomial curve fitting segmentation has a great segmentation effect for extremely fuzzy boundaries. The prior knowledge guided classification network is suitable for the evaluation of root canal therapy greatly. And the new proposed Parallel Bottleneck Transformer for realizing self-attention is general in design, facilitating a broad use in most backbone networks.


Essential Machine Learning Interview Questions and Concepts

#artificialintelligence

Q2: What is the difference between supervised and unsupervised machine learning? Q3: How is KNN different from k-means clustering? Q5: Define precision and recall. This course will help you to answer some of the questions asked in Interviews related to Machine Learning. Machine learning is the science of getting computers to act without being explicitly programmed. Machine learning is a method of data analysis that automates analytical model building.


Ontology-based Feature Selection: A Survey

arXiv.org Artificial Intelligence

The Semantic Web emerged as an extension to the traditional Web, towards adding meaning to a distributed Web of structured and linked data. At its core, the concept of ontology provides the means to semantically describe and structure information and data and expose it to software and human agents in a machine and human-readable form. For software agents to be realized, it is crucial to develop powerful artificial intelligence and machine learning techniques, able to extract knowledge from information and data sources and represent it in the underlying ontology. This survey aims to provide insight into key aspects of ontology-based knowledge extraction, from various sources such as text, images, databases and human expertise, with emphasis on the task of feature selection. First, some of the most common classification and feature selection algorithms are briefly presented. Then, selected methodologies, which utilize ontologies to represent features and perform feature selection and classification, are described. The presented examples span diverse application domains, e.g., medicine, tourism, mechanical and civil engineering, and demonstrate the feasibility and applicability of such methods.


Stealthy Backdoors as Compression Artifacts

arXiv.org Artificial Intelligence

In a backdoor attack on a machine learning model, an adversary produces a model that performs well on normal inputs but outputs targeted misclassifications on inputs containing a small trigger pattern. Model compression is a widely-used approach for reducing the size of deep learning models without much accuracy loss, enabling resource-hungry models to be compressed for use on resource-constrained devices. In this paper, we study the risk that model compression could provide an opportunity for adversaries to inject stealthy backdoors. We design stealthy backdoor attacks such that the full-sized model released by adversaries appears to be free from backdoors (even when tested using state-of-the-art techniques), but when the model is compressed it exhibits highly effective backdoors. We show this can be done for two common model compression techniques -- model pruning and model quantization. Our findings demonstrate how an adversary may be able to hide a backdoor as a compression artifact, and show the importance of performing security tests on the models that will actually be deployed not their precompressed version.


Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning

arXiv.org Machine Learning

Graph representation learning has become a ubiquitous component in many scenarios, ranging from social network analysis to energy forecasting in smart grids. In several applications, ensuring the fairness of the node (or graph) representations with respect to some protected attributes is crucial for their correct deployment. Yet, fairness in graph deep learning remains under-explored, with few solutions available. In particular, the tendency of similar nodes to cluster on several real-world graphs (i.e., homophily) can dramatically worsen the fairness of these procedures. In this paper, we propose a biased edge dropout algorithm (FairDrop) to counter-act homophily and improve fairness in graph representation learning. FairDrop can be plugged in easily on many existing algorithms, is efficient, adaptable, and can be combined with other fairness-inducing solutions. After describing the general algorithm, we demonstrate its application on two benchmark tasks, specifically, as a random walk model for producing node embeddings, and to a graph convolutional network for link prediction. We prove that the proposed algorithm can successfully improve the fairness of all models up to a small or negligible drop in accuracy, and compares favourably with existing state-of-the-art solutions. In an ablation study, we demonstrate that our algorithm can flexibly interpolate between biasing towards fairness and an unbiased edge dropout. Furthermore, to better evaluate the gains, we propose a new dyadic group definition to measure the bias of a link prediction task when paired with group-based fairness metrics. In particular, we extend the metric used to measure the bias in the node embeddings to take into account the graph structure.


Algorithmic Factors Influencing Bias in Machine Learning

arXiv.org Machine Learning

It is fair to say that many of the prominent examples of bias in Machine Learning (ML) arise from bias that is there in the training data. In fact, some would argue that supervised ML algorithms cannot be biased, they reflect the data on which they are trained. In this paper we demonstrate how ML algorithms can misrepresent the training data through underestimation. We show how irreducible error, regularization and feature and class imbalance can contribute to this underestimation. The paper concludes with a demonstration of how the careful management of synthetic counterfactuals can ameliorate the impact of this underestimation bias.


Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research

arXiv.org Artificial Intelligence

Across machine learning (ML) sub-disciplines, researchers make explicit mathematical assumptions in order to facilitate proof-writing. We note that, specifically in the area of fairness-accuracy trade-off optimization scholarship, similar attention is not paid to the normative assumptions that ground this approach. Such assumptions presume that 1) accuracy and fairness are in inherent opposition to one another, 2) strict notions of mathematical equality can adequately model fairness, 3) it is possible to measure the accuracy and fairness of decisions independent from historical context, and 4) collecting more data on marginalized individuals is a reasonable solution to mitigate the effects of the trade-off. We argue that such assumptions, which are often left implicit and unexamined, lead to inconsistent conclusions: While the intended goal of this work may be to improve the fairness of machine learning models, these unexamined, implicit assumptions can in fact result in emergent unfairness. We conclude by suggesting a concrete path forward toward a potential resolution.


Two-Stage Penalized Regression Screening to Detect Biomarker-Treatment Interactions in Randomized Clinical Trials

arXiv.org Machine Learning

High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in developing methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate screening strategy using ridge regression to account for correlations among biomarkers. For this multivariate screening, we prove the asymptotic between-stage independence, required for family-wise error rate control, under biomarker-treatment independence. Simulation results show that in various scenarios, the ridge regression screening procedure can provide substantially greater power than the traditional one-biomarker-at-a-time screening procedure in highly correlated data. We also exemplify our approach in two real clinical trial data applications.


Applications of Artificial Intelligence to aid detection of dementia: a narrative review on current capabilities and future directions

arXiv.org Artificial Intelligence

With populations ageing, the number of people with dementia worldwide is expected to triple to 152 million by 2050. Seventy percent of cases are due to Alzheimer's disease (AD) pathology and there is a 10-20 year 'pre-clinical' period before significant cognitive decline occurs. We urgently need, cost effective, objective methods to detect AD, and other dementias, at an early stage. Risk factor modification could prevent 40% of cases and drug trials would have greater chances of success if participants are recruited at an earlier stage. Currently, detection of dementia is largely by pen and paper cognitive tests but these are time consuming and insensitive to pre-clinical phases. Specialist brain scans and body fluid biomarkers can detect the earliest stages of dementia but are too invasive or expensive for widespread use. With the advancement of technology, Artificial Intelligence (AI) shows promising results in assisting with detection of early-stage dementia. Existing AI-aided methods and potential future research directions are reviewed and discussed.