Goto

Collaborating Authors

 Toussaint, Wiebke


Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows

arXiv.org Artificial Intelligence

Billions of distributed, heterogeneous and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast and offline inference on personal data. On-device ML is highly context dependent, and sensitive to user, usage, hardware and environment attributes. This sensitivity and the propensity towards bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain, and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings we suggest low effort strategies for engineers to mitigate bias in on-device ML.


SVEva Fair: A Framework for Evaluating Fairness in Speaker Verification

arXiv.org Artificial Intelligence

Despite the success of deep neural networks (DNNs) in enabling on-device voice assistants, increasing evidence of bias and discrimination in machine learning is raising the urgency of investigating the fairness of these systems. Speaker verification is a form of biometric identification that gives access to voice assistants. Due to a lack of fairness metrics and evaluation frameworks that are appropriate for testing the fairness of speaker verification components, little is known about how model performance varies across subgroups, and what factors influence performance variation. To tackle this emerging challenge, we design and develop SVEva Fair, an accessible, actionable and model-agnostic framework for evaluating the fairness of speaker verification components. The framework provides evaluation measures and visualisations to interrogate model performance across speaker subgroups and compare fairness between models. We demonstrate SVEva Fair in a case study with end-to-end DNNs trained on the VoxCeleb datasets to reveal potential bias in existing embedded speech recognition systems based on the demographic attributes of speakers. Our evaluation shows that publicly accessible benchmark models are not fair and consistently produce worse predictions for some nationalities, and for female speakers of most nationalities. To pave the way for fair and reliable embedded speaker verification, SVEva Fair has been implemented as an open-source python library and can be integrated into the embedded ML development pipeline to facilitate developers and researchers in troubleshooting unreliable speaker verification performance, and selecting high impact approaches for mitigating fairness challenges


Clustering Residential Electricity Consumption Data to Create Archetypes that Capture Variability in Customer Behaviour

arXiv.org Machine Learning

Clustering is frequently used in the energy domain to identify dominant electricity consumption patterns of households, which can be used to construct customer archetypes for long term energy planning. Selecting a useful set of clusters however requires extensive experimentation and domain knowledge. While internal clustering validation measures are well established in the electricity domain, limited research is available for external measures. We present a method that distills expert knowledge into competency questions, which we operationalised as external evaluation measures to specify the clustering objective for our application. This approach supported a structured and formal cluster validation process that combined internal and external measures to select a cluster set that is useful for creating residential electricity customer archetypes from electricity meter data in South Africa. We validated the approach in a case study application where we successfully reconstructed customer archetypes previously developed by experts. Our approach enables transparent and repeatable cluster ranking and selection by data scientists, even if they have limited domain knowledge.


Using competency questions to select optimal clustering structures for residential energy consumption patterns

arXiv.org Machine Learning

During cluster analysis domain experts and visual analysis are frequently relied on to identify the optimal clustering structure. This process tends to be adhoc, subjective and difficult to reproduce. This work shows how competency questions can be used to formalise expert knowledge and application requirements for context specific evaluation of a clustering application in the residential energy consumption sector. While cluster analysis is an established unsupervised machine learning technique, identifying the optimal set of clusters for a specific application requires extensive experimentation and domain knowledge. Cluster compactness and distinctness are two important attributes that characterise a good cluster set (Sarle et al., 1990) and different metrics, such as the Mean Index Adequacy (MIA), Davies-Bouldin Index (DBI) and the Silhouette Index have been proposed to measure cluster compactness and distinctness.