Karlstad
Orthogonal Procrustes problem preserves correlations in synthetic data
Ounissi, Oussama, Jävergård, Nicklas, Muntean, Adrian
This work introduces the application of the Orthogonal Procrustes problem to the generation of synthetic data. The proposed methodology ensures that the resulting synthetic data preserves important statistical relationships among features, specifically the Pearson correlation. An empirical illustration using a large, real-world, tabular dataset of energy consumption demonstrates the effectiveness of the approach and highlights its potential for application in practical synthetic data generation. Our approach is not meant to replace existing generative models, but rather as a lightweight post-processing step that enforces exact Pearson correlation to an already generated synthetic dataset.
'We didn't vote for ChatGPT': Swedish PM under fire for using AI in role
The Swedish prime minister, Ulf Kristersson, has come under fire after admitting that he regularly consults AI tools for a second opinion in his role running the country. Kristersson, whose Moderate party leads Sweden's centre-right coalition government, said he used tools including ChatGPT and the French service LeChat. His colleagues also used AI in their daily work, he said. Kristersson told the Swedish business newspaper Dagens industri: "I use it myself quite often. And should we think the complete opposite? Tech experts, however, have raised concerns about politicians using AI tools in such a way, and the Aftonbladet newspaper accused Kristersson in a editorial of having "fallen for the oligarchs' AI psychosis". "You have to be very careful," Simone Fischer-Hübner, a computer science researcher at Karlstad University, told Aftonbladet, warning against using ChatGPT to work with sensitive information. Kristersson's spokesperson, Tom Samuelsson, later said the prime minister did not take risks in his use of AI. "Naturally it is not security sensitive information that ends up there.
Underrepresentation, Label Bias, and Proxies: Towards Data Bias Profiles for the EU AI Act and Beyond
Ceccon, Marina, Cornacchia, Giandomenico, Pezze, Davide Dalle, Fabris, Alessandro, Susto, Gian Antonio
Undesirable biases encoded in the data are key drivers of algorithmic discrimination. Their importance is widely recognized in the algorithmic fairness literature, as well as legislation and standards on anti-discrimination in AI. Despite this recognition, data biases remain understudied, hindering the development of computational best practices for their detection and mitigation. In this work, we present three common data biases and study their individual and joint effect on algorithmic discrimination across a variety of datasets, models, and fairness measures. We find that underrepresentation of vulnerable populations in training sets is less conducive to discrimination than conventionally affirmed, while combinations of proxies and label bias can be far more critical. Consequently, we develop dedicated mechanisms to detect specific types of bias, and combine them into a preliminary construct we refer to as the Data Bias Profile (DBP). This initial formulation serves as a proof of concept for how different bias signals can be systematically documented. Through a case study with popular fairness datasets, we demonstrate the effectiveness of the DBP in predicting the risk of discriminatory outcomes and the utility of fairness-enhancing interventions. Overall, this article bridges algorithmic fairness research and anti-discrimination policy through a data-centric lens.
Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework
Rahal, Manal, Ahmed, Bestoun S., Szabados, Gergely, Fornstedt, Torgny, Samuelsson, Jorgen
Poor data quality limits the advantageous power of Machine Learning (ML) and weakens high-performing ML software systems. Nowadays, data are more prone to the risk of poor quality due to their increasing volume and complexity. Therefore, tedious and time-consuming work goes into data preparation and improvement before moving further in the ML pipeline. To address this challenge, we propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system. The proposed framework combines the curation of quality measurements and unsupervised learning to distinguish high- and low-quality data. The framework is designed to integrate flexible and general-purpose methods so that it is deployed in various domains and applications. To validate the outcomes of the designed framework, we implemented it in a real-world use case from the field of analytical chemistry, where it is tested on three datasets of anti-sense oligonucleotides. A domain expert is consulted to identify the relevant quality measurements and evaluate the outcomes of the framework. The results show that the quality-centric data evaluation framework identifies the characteristics of high-quality data that guide the conduct of efficient laboratory experiments and consequently improve the performance of the ML system.
SoK: A Classification for AI-driven Personalized Privacy Assistants
Morel, Victor, Iwaya, Leonardo, Fischer-Hübner, Simone
To help users make privacy-related decisions, personalized privacy assistants based on AI technology have been developed in recent years. These AI-driven Personalized Privacy Assistants (AI-driven PPAs) can reap significant benefits for users, who may otherwise struggle to make decisions regarding their personal data in environments saturated with privacy-related decision requests. However, no study systematically inquired about the features of these AI-driven PPAs, their underlying technologies, or the accuracy of their decisions. To fill this gap, we present a Systematization of Knowledge (SoK) to map the existing solutions found in the scientific literature. We screened 1697 unique research papers over the last decade (2013-2023), constructing a classification from 39 included papers. As a result, this SoK reviews several aspects of existing research on AI-driven PPAs in terms of types of publications, contributions, methodological quality, and other quantitative insights. Furthermore, we provide a comprehensive classification for AI-driven PPAs, delving into their architectural choices, system contexts, types of AI used, data sources, types of decisions, and control over decisions, among other facets. Based on our SoK, we further underline the research gaps and challenges and formulate recommendations for the design and development of AI-driven PPAs as well as avenues for future research.
Approximate Probabilistic Inference for Time-Series Data A Robust Latent Gaussian Model With Temporal Awareness
Johansson, Anton, Ramaswamy, Arunselvan
The development of robust generative models for highly varied non-stationary time series data is a complex yet important problem. Traditional models for time series data prediction, such as Long Short-Term Memory (LSTM), are inefficient and generalize poorly as they cannot capture complex temporal relationships. In this paper, we present a probabilistic generative model that can be trained to capture temporal information, and that is robust to data errors. We call it Time Deep Latent Gaussian Model (tDLGM). Its novel architecture is inspired by Deep Latent Gaussian Model (DLGM). Our model is trained to minimize a loss function based on the negative log loss. One contributing factor to Time Deep Latent Gaussian Model (tDLGM) robustness is our regularizer, which accounts for data trends. Experiments conducted show that tDLGM is able to reconstruct and generate complex time series data, and that it is robust against to noise and faulty data.
Harnessing AI for a climate-resilient Africa: An interview with Amal Nammouchi, co-founder of AfriClimate AI
AfriClimate AI is a grassroots community focused on leveraging artificial intelligence to tackle climate challenges in Africa. We spoke to Amal Nammouchi, one of the co-founders of AfriClimate AI, about the inspiration behind the initiative, some of their activities and projects, and plans for the future. Everything started last year at the Deep Learning Indaba in Ghana. The Deep Learning Indaba is the largest African AI community gathering and it happens once a year. The spark for AfriClimate AI came from a workshop with the work of one of our co-founders Rendani Mbuvha.
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach
Bayram, Firas, Ahmed, Bestoun S.
Artificial intelligence (AI), and especially its sub-field of Machine Learning (ML), are impacting the daily lives of everyone with their ubiquitous applications. In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions. From a practical perspective, conventional ML systems process historical data to extract the features that are consequently used to train ML models that perform the desired task. However, in practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously. To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment. Although MLOps demonstrated great success in streamlining ML processes, thoroughly defining the specifications of robust MLOps approaches remains of great interest to researchers and practitioners. In this paper, we provide a comprehensive overview of the trustworthiness property of MLOps systems. Specifically, we highlight technical practices to achieve robust MLOps systems. In addition, we survey the existing research approaches that address the robustness aspects of ML systems in production. We also review the tools and software available to build MLOps systems and summarize their support to handle the robustness aspects. Finally, we present the open challenges and propose possible future directions and opportunities within this emerging field. The aim of this paper is to provide researchers and practitioners working on practical AI applications with a comprehensive view to adopt robust ML solutions in production environments.