AITopics | communication traffic

Collaborating Authors

communication traffic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

$γ$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning

Lu, Rongwei, Jiang, Yutong, Zhang, Jinrui, Li, Chunyang, Zhu, Yifei, Chen, Bin, Wang, Zhi

arXiv.org Artificial IntelligenceMay-20-2025

Gradient compression can effectively alleviate communication bottlenecks in Federated Learning (FL). Contemporary state-of-the-art sparse compressors, such as Top-$k$, exhibit high computational complexity, up to $\mathcal{O}(d\log_2{k})$, where $d$ is the number of model parameters. The hard-threshold compressor, which simply transmits elements with absolute values higher than a fixed threshold, is thus proposed to reduce the complexity to $\mathcal{O}(d)$. However, the hard-threshold compression causes accuracy degradation in FL, where the datasets are non-IID and the stepsize $γ$ is decreasing for model convergence. The decaying stepsize reduces the updates and causes the compression ratio of the hard-threshold compression to drop rapidly to an aggressive ratio. At or below this ratio, the model accuracy has been observed to degrade severely. To address this, we propose $γ$-FedHT, a stepsize-aware low-cost compressor with Error-Feedback to guarantee convergence. Given that the traditional theoretical framework of FL does not consider Error-Feedback, we introduce the fundamental conversation of Error-Feedback. We prove that $γ$-FedHT has the convergence rate of $\mathcal{O}(\frac{1}{T})$ ($T$ representing total training iterations) under $μ$-strongly convex cases and $\mathcal{O}(\frac{1}{\sqrt{T}})$ under non-convex cases, \textit{same as FedAVG}. Extensive experiments demonstrate that $γ$-FedHT improves accuracy by up to $7.42\%$ over Top-$k$ under equal communication traffic on various non-IID image datasets.

artificial intelligence, compression, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2505.12479

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes

Su, Xiaoxin, Zhou, Yipeng, Cui, Laizhong, Lui, John C. S., Liu, Jiangchuan

arXiv.org Artificial IntelligenceFeb-6-2024

In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds, without touching private data owned by individual clients. FL is appealing in preserving data privacy; yet the communication between the PS and scattered clients can be a severe bottleneck. Model compression algorithms, such as quantization and sparsification, have been suggested but they generally assume a fixed code length, which does not reflect the heterogeneity and variability of model updates. In this paper, through both analysis and experiments, we show strong evidences that variable-length is beneficial for compression in FL. We accordingly present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response of the dynamics of model updates. We develop optimal tuning strategy that minimizes the loss function (equivalent to maximizing the model utility) subject to the budget for communication. We further demonstrate that Fed-CVLC is indeed a general compression design that bridges quantization and sparsification, with greater flexibility. Extensive experiments have been conducted with public datasets to demonstrate that Fed-CVLC remarkably outperforms state-of-the-art baselines, improving model utility by 1.50%-5.44%, or shrinking communication traffic by 16.67%-41.61%.

algorithm, model update, packet, (14 more...)

arXiv.org Artificial Intelligence

2402.0377

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

FedDWA: Personalized Federated Learning with Dynamic Weight Adjustment

Liu, Jiahao, Wu, Jiang, Chen, Jinyu, Hu, Miao, Zhou, Yipeng, Wu, Di

arXiv.org Artificial IntelligenceJul-16-2023

Different from conventional federated learning, personalized federated learning (PFL) is able to train a customized model for each individual client according to its unique requirement. The mainstream approach is to adopt a kind of weighted aggregation method to generate personalized models, in which weights are determined by the loss value or model parameters among different clients. However, such kinds of methods require clients to download others' models. It not only sheer increases communication traffic but also potentially infringes data privacy. In this paper, we propose a new PFL algorithm called \emph{FedDWA (Federated Learning with Dynamic Weight Adjustment)} to address the above problem, which leverages the parameter server (PS) to compute personalized aggregation weights based on collected models from clients. In this way, FedDWA can capture similarities between clients with much less communication overhead. More specifically, we formulate the PFL problem as an optimization problem by minimizing the distance between personalized models and guidance models, so as to customize aggregation weights for each client. Guidance models are obtained by the local one-step ahead adaptation on individual clients. Finally, we conduct extensive experiments using five real datasets and the results demonstrate that FedDWA can significantly reduce the communication traffic and achieve much higher model accuracy than the state-of-the-art approaches.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2305.06124

Country:

North America > United States > Virginia (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

A Fast Blockchain-based Federated Learning Framework with Compressed Communications

Cui, Laizhong, Su, Xiaoxin, Zhou, Yipeng

arXiv.org Artificial IntelligenceAug-11-2022

Recently, blockchain-based federated learning (BFL) has attracted intensive research attention due to that the training process is auditable and the architecture is serverless avoiding the single point failure of the parameter server in vanilla federated learning (VFL). Nevertheless, BFL tremendously escalates the communication traffic volume because all local model updates (i.e., changes of model parameters) obtained by BFL clients will be transmitted to all miners for verification and to all clients for aggregation. In contrast, the parameter server and clients in VFL only retain aggregated model updates. Consequently, the huge communication traffic in BFL will inevitably impair the training efficiency and hinder the deployment of BFL in reality. To improve the practicality of BFL, we are among the first to propose a fast blockchain-based communication-efficient federated learning framework by compressing communications in BFL, called BCFL. Meanwhile, we derive the convergence rate of BCFL with non-convex loss. To maximize the final model accuracy, we further formulate the problem to minimize the training loss of the convergence rate subject to a limited training time with respect to the compression rate and the block generation rate, which is a bi-convex optimization problem and can be efficiently solved. To the end, to demonstrate the efficiency of BCFL, we carry out extensive experiments with standard CIFAR-10 and FEMNIST datasets. Our experimental results not only verify the correctness of our analysis, but also manifest that BCFL can remarkably reduce the communication traffic by 95-98% or shorten the training time by 90-95% compared with BFL.

bcfl, iteration, model update, (12 more...)

arXiv.org Artificial Intelligence

2208.06095

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Oceania > Australia > South Australia (0.04)
Asia > China > Jilin Province > Changchun (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.93)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

How to derive ring all-reduce's mathematical property step by step

#artificialintelligenceJun-16-2022, 13:47:00 GMT

In our previous blog: Combating Software System Complexity: Appropriate Abstraction Layer, we mentioned that the communication in a distributed deep learning framework is highly dependent on regular collective communication operations like all-reduce, reduce-scatter, all-gather, and so on. Therefore, it's crucial to implement a highly optimized collective communication and select an ideal algorithm based on task requirements and communication typology. This article will unveil the mathematical property of collective communication operations by analyzing the case of all-reduce, which is common in data parallelism. As illustrated in Figure 1, there are four devices, each with one matrix (to keep things simple, each row in these matrices has only one element). And all-reduce is an operation that sums up the same row's input value across devices and returns the resultant value to the corresponding row.

algorithm, communication, opération, (14 more...)

#artificialintelligence

Genre: Workflow (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging

Bhattacharya, Subhadeep, Yu, Weikuan, Chowdhury, Fahim Tahmid

arXiv.org Machine LearningJun-15-2020

Large neural network models present a hefty communication challenge to distributed Stochastic Gradient Descent (SGD), with a communication complexity of O(n) per worker for a model of n parameters. Many sparsification and quantization techniques have been proposed to compress the gradients, some reducing the communication complexity to O(k), where k << n. In this paper, we introduce a strategy called two-level gradient averaging (A2SGD) to consolidate all gradients down to merely two local averages per worker before the computation of two global averages for an updated model. A2SGD also retains local errors to maintain the variance for fast convergence. Our theoretical analysis shows that A2SGD converges similarly like the default distributed SGD algorithm. Our evaluation validates the theoretical conclusion and demonstrates that A2SGD significantly reduces the communication traffic per worker, and improves the overall training time of LSTM-PTB by 3.2x and 23.2x, respectively, compared to Top-K and QSGD. To the best of our knowledge, A2SGD is the first to achieve O(1) communication complexity per worker for distributed SGD.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2006.07405

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > Colorado > Broomfield County > Broomfield (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Tang, Zhenheng, Shi, Shaohuai, Chu, Xiaowen

arXiv.org Machine LearningFeb-22-2020

Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or decentralized) suffer from the communication bottleneck on multiple low-bandwidth workers (also on the server under the centralized architecture). Although decentralized algorithms generally have lower communication complexity than the centralized counterpart, they still suffer from the communication bottleneck for workers with low network bandwidth. To deal with the communication problem while being able to preserve the convergence performance, we introduce a novel decentralized training algorithm with the following key features: 1) It does not require a parameter server to maintain the model during training, which avoids the communication pressure on any single peer. 2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker. We theoretically prove that our sparsification algorithm still preserves convergence properties. 3) Each worker dynamically selects its peer at different communication rounds to better utilize the bandwidth resources. We conduct experiments with convolutional neural networks on 32 workers to verify the effectiveness of our proposed algorithm compared to seven existing methods. Experimental results show that our algorithm significantly reduces the communication traffic and generally select relatively high bandwidth peers.

algorithm, bandwidth, communication traffic, (15 more...)

arXiv.org Machine Learning

2002.09692

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

In predicting a stroke's toll, location matters, but so do connections

Los Angeles TimesJul-12-2016, 03:10:19 GMT

Each year, roughly 666,000 Americans survive a stroke, and for them, the aftermath can be hard to predict. Some stroke patients have difficulty speaking or grasp for words that do not come. Some suffer problems with vision, balance or mobility. Some are addled by attention, memory and other cognitive deficits that can range from subtle to severe. To glean what kinds of disabilities a patient will probably face, neurologists have long looked at the location of the lesion a stroke leaves behind -- on a brain scan, the darkened site where cells have died off.

artificial intelligence, social media, stroke patient, (11 more...)

Los Angeles Times

Country: North America > United States > California > Los Angeles County > Los Angeles (0.05)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Communications > Social Media (0.52)
Information Technology > Artificial Intelligence (0.51)

Add feedback