AITopics | fedsubavg

002262941c9edfd472a79298b2ac5e17-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:14:54 GMT

A.1 Proof Sketch We first introduce the following lemma: Lemma 1. Lemma 2. For matrices A,B 2Mn, if A B, then we have min(A) min(B)and max(A) max(B), where max() (resp., min()) denotes taking the maximum (resp., minimum) eigenvalue.. Proof of Lemma 2. For any matrix P 2Mn with P> = P, we have max(P) = max We first consider the condition number of ˆH when X is in a locally convex area. By equations 3 and 4, we have M1 H M2. Rearranging the terms yields H M1 0 and M2 H 0. Therefore, for any vector x 2RM, we have We next consider the minimum singular value of H and ˆH with min(H)= p min(H2) and min(ˆH)= q min(ˆH2) in any case. Under Assumption 1 and equation 4, we have H M2. Similarly, we can obtain H M2. By Lemma 2, we further have max(H) max(M2)= nmax 2 C.1 kr ˆf(ˆX) k2 vs. krf(X) k2 In this section, we explain why we use kr ˆf(ˆX) k2 rather than kr f(X) k2 to characterize the convergence rate. In general, it is hard to develop a convergence rate for objective values. However, when the global model is in a locally convex area of f, we can obtain the relationship between the gradient and the local optimum.

artificial intelligence, machine learning, rfi xti, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

002262941c9edfd472a79298b2ac5e17-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:14:51 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Communications > Social Media (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Federated Submodel Optimization for Hot and Cold Data Features

Neural Information Processing SystemsDec-23-2025, 16:37:39 GMT

We focus on federated learning in practical recommender systems and natural language processing scenarios. The global model for federated optimization typically contains a large and sparse embedding layer, while each client's local data tend to interact with part of features, updating only a small submodel with the feature-related embedding vectors. We identify a new and important issue that distinct data features normally involve different numbers of clients, generating the differentiation of hot and cold features. We further reveal that the classical federated averaging algorithm (FedAvg) or its variants, which randomly selects clients to participate and uniformly averages their submodel updates, will be severely slowed down, because different parameters of the global model are optimized at different speeds. More specifically, the model parameters related to hot (resp., cold) features will be updated quickly (resp., slowly).

federated submodel optimization, hot and cold data feature, name change, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

Federated Submodel Optimization for Hot and Cold Data Features

Neural Information Processing SystemsOct-9-2024, 08:36:28 GMT

We focus on federated learning in practical recommender systems and natural language processing scenarios. The global model for federated optimization typically contains a large and sparse embedding layer, while each client's local data tend to interact with part of features, updating only a small submodel with the feature-related embedding vectors. We identify a new and important issue that distinct data features normally involve different numbers of clients, generating the differentiation of hot and cold features. We further reveal that the classical federated averaging algorithm (FedAvg) or its variants, which randomly selects clients to participate and uniformly averages their submodel updates, will be severely slowed down, because different parameters of the global model are optimized at different speeds. More specifically, the model parameters related to hot (resp., cold) features will be updated quickly (resp., slowly).

federated submodel optimization, fedsubavg, hot and cold data feature, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.99)

Add feedback

A Proof of Theorem 1, A2, B1, B

Neural Information Processing SystemsFeb-18-2024, 04:14:14 GMT

A.1 Proof Sketch We first introduce the following lemma: Lemma 1. We first consider the condition number of Ĥ when X is in a locally convex area. In general, it is hard to develop a convergence rate for objective values. However, when the global model is in a locally convex area of f, we can obtain the relationship between the gradient and the local optimum. Theorem 4. When there is no parameter heat dispersion, and X is in a µ-strongly convex area of f We note that there is a difference between equation 18 and 21: for each client i, equation 18 involves all the parameters of the full model while equation 21 involves only partial parameters of the submodel, which causes a change in the lower bound of T (Y) and further leads to a change of conclusion.

apple, fedsubavg, heat dispersion, (16 more...)

Neural Information Processing Systems

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Federated Submodel Optimization for Hot and Cold Data Features Yucheng Ding 1 Fan Wu1 Shaojie Tang

Neural Information Processing SystemsFeb-18-2024, 04:14:11 GMT

We focus on federated learning in practical recommender systems and natural language processing scenarios. The global model for federated optimization typically contains a large and sparse embedding layer, while each client's local data tend to interact with part of features, updating only a small submodel with the feature-related embedding vectors. We identify a new and important issue that distinct data features normally involve different numbers of clients, generating the differentiation of hot and cold features. We further reveal that the classical federated averaging algorithm (FedAvg) or its variants, which randomly selects clients to participate and uniformly averages their submodel updates, will be severely slowed down, because different parameters of the global model are optimized at different speeds. More specifically, the model parameters related to hot (resp., cold) features will be updated quickly (resp., slowly).

dataset, fedsubavg, heat dispersion, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > United States > Texas (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Federated Submodel Averaging

Ding, Yucheng, Wu, Chaoyue Niu. Fan, Tang, Shaojie, Lv, Chengfei, Feng, Yanghe, Chen, Guihai

arXiv.org Artificial IntelligenceSep-15-2021

We study practical data characteristics underlying federated learning, where non-i.i.d. data from clients have sparse features, and a certain client's local data normally involves only a small part of the full model, called a submodel. Due to data sparsity, the classical federated averaging (FedAvg) algorithm or its variants will be severely slowed down, because when updating the global model, each client's zero update of the full model excluding its submodel is inaccurately aggregated. Therefore, we propose federated submodel averaging (FedSubAvg), ensuring that the expectation of the global update of each model parameter is equal to the average of the local updates of the clients who involve it. We theoretically proved the convergence rate of FedSubAvg by deriving an upper bound under a new metric called the element-wise gradient norm. In particular, this new metric can characterize the convergence of federated optimization over sparse data, while the conventional metric of squared gradient norm used in FedAvg and its variants cannot. We extensively evaluated FedSubAvg over both public and industrial datasets. The evaluation results demonstrate that FedSubAvg significantly outperforms FedAvg and its variants.

fedavg, fedsubavg, gradient norm, (16 more...)

arXiv.org Artificial Intelligence

2109.07704

Country: