stratify
Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling
Wong, Hui Yeok, Lim, Chee Kau, Chan, Chee Seng
Federated Learning (FL) on non-independently and identically distributed (non-IID) data remains a critical challenge, as existing approaches struggle with severe data heterogeneity. Current methods primarily address symptoms of non-IID by applying incremental adjustments to Federated Averaging (FedAvg), rather than directly resolving its inherent design limitations. Consequently, performance significantly deteriorates under highly heterogeneous conditions, as the fundamental issue of imbalanced exposure to diverse class and feature distributions remains unresolved. This paper introduces Stratify, a novel FL framework designed to systematically manage class and feature distributions throughout training, effectively tackling the root cause of non-IID challenges. Inspired by classical stratified sampling, our approach employs a Stratified Label Schedule (SLS) to ensure balanced exposure across labels, significantly reducing bias and variance in aggregated gradients. Complementing SLS, we propose a label-aware client selection strategy, restricting participation exclusively to clients possessing data relevant to scheduled labels. Additionally, Stratify incorporates a fine-grained, high-frequency update scheme, accelerating convergence and further mitigating data heterogeneity. To uphold privacy, we implement a secure client selection protocol leveraging homomorphic encryption, enabling precise global label statistics without disclosing sensitive client information. Extensive evaluations on MNIST, CIFAR-10, CIFAR-100, Tiny-ImageNet, COVTYPE, PACS, and Digits-DG demonstrate that Stratify attains performance comparable to IID baselines, accelerates convergence, and reduces client-side computation compared to state-of-the-art methods, underscoring its practical effectiveness in realistic federated learning scenarios.
Stratify: Unifying Multi-Step Forecasting Strategies
Green, Riku, Stevens, Grant, Abdallah, Zahraa, Filho, Telmo M. Silva
A key aspect of temporal domains is the ability to make predictions multiple time steps into the future, a process known as multi-step forecasting (MSF). At the core of this process is selecting a forecasting strategy, however, with no existing frameworks to map out the space of strategies, practitioners are left with ad-hoc methods for strategy selection. In this work, we propose Stratify, a parameterised framework that addresses multi-step forecasting, unifying existing strategies and introducing novel, improved strategies. We evaluate Stratify on 18 benchmark datasets, five function classes, and short to long forecast horizons (10, 20, 40, 80). In over 84% of 1080 experiments, novel strategies in Stratify improved performance compared to all existing ones. Importantly, we find that no single strategy consistently outperforms others in all task settings, highlighting the need for practitioners explore the Stratify space to carefully search and select forecasting strategies based on task-specific requirements. Our results are the most comprehensive benchmarking of known and novel forecasting strategies. We make code available to reproduce our results.
What is Stratify in train_test_split? With example - Dragon Forest
To spit data into a training set and test set, you had indeed used the train_test_split library from scikit learn. There are some parameters in train_test_split like random_state, stratify, shuffle, test_size, etc. Here we will talk about one parameter called stratify in train_test_split in a simple way. Basically, we use stratify to create an unbiased dataset when you have a biased dataset. Suppose we have data and if that data is biased then we can have to use stratify to overcome train_test_split's biased random sampling problem.
Stratified Random Sampling Using Python and Pandas
Sometimes the sample data that data scientists are given does not fit what we know about the wider population data. For example, lets assume that the data science team were given survey data and we noticed that the survey respondents were 60% male and 40% female. In the real world the UK general population is closer to 49.4% male and 50.6% female (source: https://tinyurl.com/43hpe5e4) There could be many explanations for our 60% male sample data. One possibility is that the data collection method might have been flawed.
Pandemic Ai - iOS + Apple Watch
Your digital vital sign dashboard will show your heart rate, heart rate variability, oxygen saturation, and will use our AI platform to assume longitudinal exacerbations or recovery of your COVID-19 infection. Our 5 meter walk test (frailty test) and 6 minute walk test (cardiovascular function) will be bound to your accelerometer and be able to trend your frailty, heart and lung reserve for fitness as well as for infection. You will be able to enter your medication and laboratory tests in our tracker. Your GPS location will be coupled to a geolocation beacon to an emergency medical service and your physician. Insights will provide links to our clinical trials, CDC, FDA websites as well as advice about anti-inflammatory diets and peer reviewed scientific journals. The Yoga Mode will also allow a proprietary therapeutic anti-inflammatory yogic breathing which will certainly change your cardiovascular health.