non-i
Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization
Yang, Junyoung, Kim, Kyungmin, Park, Sangdon
Uncertainty quantification is crucial in safety-critical systems, where decisions must be made under uncertainty. In particular, we consider the problem of online uncertainty quantification, where data points arrive sequentially. Online conformal prediction is a principled online uncertainty quantification method that dynamically constructs a prediction set at each time step. While existing methods for online conformal prediction provide long-run coverage guarantees without any distributional assumptions, they typically assume a full feedback setting in which the true label is always observed. In this paper, we propose a novel learning method for online conformal prediction with partial feedback from an adaptive adversary-a more challenging setup where the true label is revealed only when it lies inside the constructed prediction set. Specifically, we formulate online conformal prediction as an adversarial bandit problem by treating each candidate prediction set as an arm. Building on an existing algorithm for adversarial bandits, our method achieves a long-run coverage guarantee by explicitly establishing its connection to the regret of the learner. Finally, we empirically demonstrate the effectiveness of our method in both independent and identically distributed (i.i.d.) and non-i.i.d. settings, showing that it successfully controls the miscoverage rate while maintaining a reasonable size of the prediction set.
Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception
Li, Yunzhe, Hu, Facheng, Zhu, Hongzi, Liu, Quan, Zhao, Xiaoke, Shen, Jiangang, Chang, Shan, Guo, Minyi
A wide range of user perception applications leverage inertial measurement unit (IMU) data for online prediction. However, restricted by the non-i.i.d. nature of IMU data collected from mobile devices, most systems work well only in a controlled setting (e.g., for a specific user in particular postures), limiting application scenarios. To achieve uncontrolled online prediction on mobile devices, referred to as the flexible user perception (FUP) problem, is attractive but hard. In this paper, we propose a novel scheme, called Prism, which can obtain high FUP accuracy on mobile devices. The core of Prism is to discover task-aware domains embedded in IMU dataset, and to train a domain-aware model on each identified domain. To this end, we design an expectation-maximization (EM) algorithm to estimate latent domains with respect to the specific downstream perception task. Finally, the best-fit model can be automatically selected for use by comparing the test sample and all identified domains in the feature space. We implement Prism on various mobile devices and conduct extensive experiments. Results demonstrate that Prism can achieve the best FUP performance with a low latency.
Lifelong Learning with Non-i.i.d. Tasks
In this work we aim at extending theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that the tasks are sampled i.i.d. Instead we study two scenarios when lifelong learning is possible, even though the observed tasks do not form an i.i.d. In the first case we prove a PAC-Bayesian theorem, which can be seen as a direct generalization of the analogous previous result for the i.i.d. For the second scenario we propose to learn an inductive bias in form of a transfer procedure.
FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
Jing, Shusen, Yu, Anlan, Zhang, Shuai, Zhang, Songyang
Recent efforts have been made to integrate self-supervised learning (SSL) with the framework of federated learning (FL). One unique challenge of federated self-supervised learning (FedSSL) is that the global objective of FedSSL usually does not equal the weighted sum of local SSL objectives. Consequently, conventional approaches, such as federated averaging (FedAvg), fail to precisely minimize the FedSSL global objective, often resulting in suboptimal performance, especially when data is non-i.i.d.. To fill this gap, we propose a provable FedSSL algorithm, named FedSC, based on the spectral contrastive objective. In FedSC, clients share correlation matrices of data representations in addition to model weights periodically, which enables inter-client contrast of data samples in addition to intra-client contrast and contraction, resulting in improved quality of data representations. Differential privacy (DP) protection is deployed to control the additional privacy leakage on local datasets when correlation matrices are shared. We also provide theoretical analysis on the convergence and extra privacy leakage. The experimental results validate the effectiveness of our proposed algorithm.
Fast Learning from Non-i.i.d. Observations
We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from \a -mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d.
FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning
Alsenani, Yousef, Mishra, Rahul, Ahmed, Khaled R., Rahman, Atta Ur
In recent years, federated learning (FL) has emerged as a promising technique for training machine learning models in a decentralized manner while also preserving data privacy. The non-independent and identically distributed (non-i.i.d.) nature of client data, coupled with constraints on client or edge devices, presents significant challenges in FL. Furthermore, learning across a high number of communication rounds can be risky and potentially unsafe for model exploitation. Traditional FL approaches may suffer from these challenges. Therefore, we introduce FedSiKD, which incorporates knowledge distillation (KD) within a similarity-based federated learning framework. As clients join the system, they securely share relevant statistics about their data distribution, promoting intra-cluster homogeneity. This enhances optimization efficiency and accelerates the learning process, effectively transferring knowledge between teacher and student models and addressing device constraints. FedSiKD outperforms state-of-the-art algorithms by achieving higher accuracy, exceeding by 25\% and 18\% for highly skewed data at $\alpha = {0.1,0.5}$ on the HAR and MNIST datasets, respectively. Its faster convergence is illustrated by a 17\% and 20\% increase in accuracy within the first five rounds on the HAR and MNIST datasets, respectively, highlighting its early-stage learning proficiency. Code is publicly available and hosted on GitHub (https://github.com/SimuEnv/FedSiKD)
MrTF: Model Refinery for Transductive Federated Learning
Li, Xin-Chun, Yang, Yang, Zhan, De-Chuan
We consider a real-world scenario in which a newly-established pilot project needs to make inferences for newly-collected data with the help of other parties under privacy protection policies. Current federated learning (FL) paradigms are devoted to solving the data heterogeneity problem without considering the to-be-inferred data. We propose a novel learning paradigm named transductive federated learning (TFL) to simultaneously consider the structural information of the to-be-inferred data. On the one hand, the server could use the pre-available test samples to refine the aggregated models for robust model fusion, which tackles the data heterogeneity problem in FL. On the other hand, the refinery process incorporates test samples into training and could generate better predictions in a transductive manner. We propose several techniques including stabilized teachers, rectified distillation, and clustered label refinery to facilitate the model refinery process. Abundant experimental studies verify the superiorities of the proposed \underline{M}odel \underline{r}efinery framework for \underline{T}ransductive \underline{F}ederated learning (MrTF). The source code is available at \url{https://github.com/lxcnju/MrTF}.
Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy Synthesizing Network
Guo, Jingcai, Guo, Song, Zhang, Jie, Liu, Ziming
Federated learning (FL) has emerged as a promising privacy-preserving distributed machine learning framework recently. It aims at collaboratively learning a shared global model by performing distributed training locally on edge devices and aggregating local models into a global one without centralized raw data sharing in the cloud server. However, due to the large local data heterogeneities (Non-I.I.D. data) across edge devices, the FL may easily obtain a global model that can produce more shifted gradients on local datasets, thereby degrading the model performance or even suffering from the non-convergence during training. In this paper, we propose a novel FL training framework, dubbed Fed-FSNet, using a properly designed Fuzzy Synthesizing Network (FSNet) to mitigate the Non-I.I.D. FL at-the-source. Concretely, we maintain an edge-agnostic hidden model in the cloud server to estimate a less-accurate while direction-aware inversion of the global model. The hidden model can then fuzzily synthesize several mimic I.I.D. data samples (sample features) conditioned on only the global model, which can be shared by edge devices to facilitate the FL training towards faster and better convergence. Moreover, since the synthesizing process involves neither access to the parameters/updates of local models nor analyzing individual local model outputs, our framework can still ensure the privacy of FL. Experimental results on several FL benchmarks demonstrate that our method can significantly mitigate the Non-I.I.D. issue and obtain better performance against other representative methods.
Stability Bounds for Non-i.i.d. Processes
The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are de- signed for specific learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed (i.i.d.). In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence, which is clear in system diagnosis or time series prediction problems.
Stability Bounds for Non-i.i.d. Processes
Mohri, Mehryar, Rostamizadeh, Afshin
The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are de- signed for specific learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed (i.i.d.). In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence, which is clear in system diagnosis or time series prediction problems.