Cui, Wenquan
Functional sufficient dimension reduction through information maximization with application to classification
Li, Xinyu, Xu, Jianjun, Cui, Wenquan, Cheng, Haoyang
Considering the case where the response variable is a categorical variable and the predictor is a random function, two novel functional sufficient dimensional reduction (FSDR) methods are proposed based on mutual information and square loss mutual information. Compared to the classical FSDR methods, such as functional sliced inverse regression and functional sliced average variance estimation, the proposed methods are appealing because they are capable of estimating multiple effective dimension reduction directions in the case of a relatively small number of categories, especially for the binary response. Moreover, the proposed methods do not require the restrictive linear conditional mean assumption and the constant covariance assumption. They avoid the inverse problem of the covariance operator which is often encountered in the functional sufficient dimension reduction. The functional principal component analysis with truncation be used as a regularization mechanism. Under some mild conditions, the statistical consistency of the proposed methods is established. It is demonstrated that the two methods are competitive compared with some existing FSDR methods by simulations and real data analyses.
Federated Covariate Shift Adaptation for Missing Target Output Values
Xu, Yaqian, Cui, Wenquan, Xu, Jianjun, Cheng, Haoyang
The most recent multi-source covariate shift algorithm is an efficient hyperparameter optimization algorithm for missing target output. In this paper, we extend this algorithm to the framework of federated learning. For data islands in federated learning and covariate shift adaptation, we propose the federated domain adaptation estimate of the target risk which is asymptotically unbiased with a desirable asymptotic variance property. We construct a weighted model for the target task and propose the federated covariate shift adaptation algorithm which works preferably in our setting. The efficacy of our method is justified both theoretically and empirically.
Online Kernel Sliced Inverse Regression
Cui, Wenquan, Zhao, Yue, Xu, Jianjun, Cheng, Haoyang
Online dimension reduction is a common method for high-dimensional streaming data processing. Online principal component analysis, online sliced inverse regression, online kernel principal component analysis and other methods have been studied in depth, but as far as we know, online supervised nonlinear dimension reduction methods have not been fully studied. In this article, an online kernel sliced inverse regression method is proposed. By introducing the approximate linear dependence condition and dictionary variable sets, we address the problem of increasing variable dimensions with the sample size in the online kernel sliced inverse regression method, and propose a reduced-order method for updating variables online. We then transform the problem into an online generalized eigen-decomposition problem, and use the stochastic optimization method to update the centered dimension reduction directions. Simulations and the real data analysis show that our method can achieve close performance to batch processing kernel sliced inverse regression.
Federated Sufficient Dimension Reduction Through High-Dimensional Sparse Sliced Inverse Regression
Cui, Wenquan, Zhao, Yue, Xu, Jianjun, Cheng, Haoyang
Federated learning is a distributed machine learning paradigm that collaboratively trains a model with data on many clients. Unlike traditional distributed machine learning methods, which partition data into different clients to improve the efficiency of the learning algorithm, the goal of federated learning is to solve the learning problem without requiring the clients to reveal too much local information. With the increasing demand for data security and privacy protection, federated learning has received significant attention in both industry and academia. For example, banks want to collaboratively train a credit card scoring model without disclosing information about their customers, or hospitals want to carry out researches on a rare disease with each other due to the small number of sample cases, but they can't expose their patients' identity. For more on the progress of federated learning, see [1, 2]. The term federated learning was introduced by McMahan et al. [3], they also proposed the Federated Averaging(FedAvg) algorithm. FedAvg composes multiple rounds of local stochastic gradient descent updates and server-side averaging aggregation to train a centralized model.