real drift
FedDAA: Dynamic Client Clustering for Concept Drift Adaptation in Federated Learning
In federated learning (FL), the data distribution of each client may change over time, introducing both temporal and spatial data heterogeneity, known as concept drift. Data heterogeneity arises from three drift sources: real drift (a shift in the conditional distribution P(y|x)), virtual drift (a shift in the input distribution P(x)), and label drift (a shift in the label distribution P(y)). However, most existing FL methods addressing concept drift primarily focus on real drift. When clients experience virtual or label drift, these methods often fail to selectively retain useful historical knowledge, leading to catastrophic forgetting. A key challenge lies in distinguishing different sources of drift, as they require distinct adaptation strategies: real drift calls for discarding outdated data, while virtual or label drift benefits from retaining historical data. Without explicitly identifying the drift sources, a general adaptation strategy is suboptimal and may harm generalization. To address this challenge, we propose FedDAA, a dynamic clustered FL framework designed to adapt to multi-source concept drift while preserving valuable historical knowledge. Specifically, FedDAA integrates three modules: a cluster number determination module to find the optimal number of clusters; a real drift detection module to distinguish real drift from virtual/label drift; and a concept drift adaptation module to adapt to new data while retaining useful historical information. We provide theoretical convergence guarantees, and experiments show that FedDAA achieves 7.84% to 8.52% accuracy improvements over state-of-the-art methods on Fashion-MNIST, CIFAR-10, and CIFAR-100.
Drift Detection: Introducing Gaussian Split Detector
Fuccellaro, Maxime, Simon, Laurent, Zemmari, Akka
Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Explaining Drift using Shapley Values
Edakunni, Narayanan U., Tekriwal, Utkarsh, Jain, Anukriti
Machine learning models often deteriorate in their performance when they are used to predict the outcomes over data on which they were not trained. These scenarios can often arise in real world when the distribution of data changes gradually or abruptly due to major events like a pandemic. There have been many attempts in machine learning research to come up with techniques that are resilient to such Concept drifts. However, there is no principled framework to identify the drivers behind the drift in model performance. In this paper, we propose a novel framework - DBShap that uses Shapley values to identify the main contributors of the drift and quantify their respective contributions. The proposed framework not only quantifies the importance of individual features in driving the drift but also includes the change in the underlying relation between the input and output as a possible driver. The explanation provided by DBShap can be used to understand the root cause behind the drift and use it to make the model resilient to the drift.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
On the Change of Decision Boundaries and Loss in Learning with Concept Drift
Hinder, Fabian, Vaquet, Valerie, Brinkrolf, Johannes, Hammer, Barbara
The world that surrounds us is subject to constant change, which also affects the increasing amount of data collected over time, in social media, sensor networks, IoT devices, etc. Those changes, referred to as concept drift, can be caused by seasonal changes, changing demands of individual customers, aging or failing sensors, and many more. As drift constitutes a major issue in many applications, considerable research is focusing on this setting [4]. Depending on the domain of data and application, different drift scenarios might occur: For example, covariate shift refers to the situation that training and test sets have different marginal distributions [9]. In recent years, a large variety of methods for learning in presence of drift has been proposed [4], whereby a majority of the approaches targets supervised learning scenarios.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Nebraska (0.04)
- (4 more...)
Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model
Oliveira, Gustavo, Minku, Leandro, Oliveira, Adriano
Abstract--Real-world applications have been dealing with large amounts of data that arrive over time and generally present changes in their underlying joint probability distribution, i.e., concept drift. Concept drift can be subdivided into two types: virtual drift, which affects the unconditional probability distribution p(x), and real drift, which affects the conditional probability distribution p(y x) . Existing works focuses on real drift. However, strategies to cope with real drift may not be the best suited for dealing with virtual drift, since the real class boundaries remain unchanged. We provide the first in depth analysis of the differences between the impact of virtual and real drifts on classifiers' suitability. We propose an approach to handle both drifts called On-line Gaussian Mixture Model With Noise Filter For Handling Virtual and Real Concept Drifts (OGMMF-VRD). Experiments with 7 synthetic and 3 real-world datasets show that OGMMF-VRD obtained the best results in terms of average accuracy, G-mean and runtime compared to existing approaches. Moreover, its accuracy over time suffered less performance degradation in the presence of drifts. In recent years, real-world applications like credit card learned decision boundaries, which need to be adjusted for fraud detection, flight delay and weather forecasting have the classifier to remain suitable. Such sequences of data are known as data stream learning approaches treat virtual drifts using data streams [2, 3]. They are challenging for data modeling the same strategies as for real drifts [6].
- South America > Brazil > Pernambuco > Recife (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Concept Drift Detection and Adaptation with Weak Supervision on Streaming Unlabeled Data
Concept drift in learning and classification occurs when the statistical properties of either the data features or target change over time; evidence of drift has appeared in search data, medical research, malware, web data, and video. Drift adaptation has not yet been addressed in high dimensional, noisy, low-context data such as streaming text, video, or images due to the unique challenges these domains present. We present a two-fold approach to deal with concept drift in these domains: a density-based clustering approach to deal with virtual concept drift (change in statistical properties of features) and a weak-supervision step to deal with real concept drift (change in statistical properties of target). Our density-based clustering avoids problems posed by the curse of dimensionality to create an evolving 'map' of the live data space, thereby addressing virtual drift in features. Our weak-supervision step leverages high-confidence labels (oracle or heuristic labels) to generate weighted training sets to generalize and update existing deep learners to adapt to changing decision boundaries (real drift) and create new deep learners for unseen regions of the data space. Our results show that our two-fold approach performs well with >90% precision in 2018, four years after initial deployment in 2014, without any human intervention.
- Information Technology (0.48)
- Health & Medicine (0.48)