AITopics | communication overhead

This search does not affect the computational complexity, which is O(νnDE +SE) for agent n that computes DE parallel consensus steps and goes over a listofSE actionprofiles. Intuitively,wewouldneedE KN tofindtheoptimalactionprofile even with no noise, which creates delays where agents have to wait for their average reward to go abovetheirλn. In the multitasking robots game, if agent n has Ren = 0, then theoptimalactionprofilea e hastosatisfya e,m = nforallm. Ifλisasafemarginawayfromthe boundary of C(G), then most agents will have Ren = 0 most of the time. Hence, their performance depends on the best action profile in SE.

action profile, artificial intelligence, ren, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.66)

Add feedback

Distributed Distillation for On-Device Learning

Neural Information Processing SystemsDec-24-2025, 22:48:37 GMT

On-device learning promises collaborative training of machine learning models across edge devices without the sharing of user data. In state-of-the-art on-device learning algorithms, devices communicate their model weights over a decentralized communication network. Transmitting model weights requires huge communication overhead and means only devices with identical model architectures can be included. To overcome these limitations, we introduce a distributed distillation algorithm where devices communicate and learn from soft-decision (softmax) outputs, which are inherently architecture-agnostic and scale only with the number of classes. The communicated soft-decisions are each model's outputs on a public, unlabeled reference dataset, which serves as a common vocabulary between devices. We prove that our algorithm converges with probability 1 to a stationary point where all devices in the communication network distill the entire network's knowledge on the reference data, regardless of their local connections. Our analysis assumes smooth loss functions, which can be non-convex. Simulations support our theoretical findings and show that even a naive implementation of our algorithm significantly reduces the communication overhead while achieving an overall comparable performance to state-of-the-art, depending on the regime. By requiring little communication overhead and allowing for cross-architecture training, we remove two main obstacles to scaling on-device learning.

communication overhead, distillation, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

communication overhead

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ead13878cd158f013becb6a559a60364-Paper-Conference.pdf

Breaking the Communication-Privacy-Accuracy Tradeoff with f-Differential Privacy

A Communication Efficient Stochastic Multi-Block Alternating Direction Method of Multipliers

3d3a9e085540c65dd3e5731361f9320e-Paper-Conference.pdf

b94d8b035e2183e47afef9e2f299ba47-Supplemental-Conference.pdf

fef6f971605336724b5e6c0c12dc2534-Paper.pdf

fb2fcd534b0ff3bbed73cc51df620323-Paper.pdf

c82b013313066e0702d58dc70db033ca-Paper.pdf

056e8e9c8ca9929cb6cf198952bf1dbb-Supplemental-Conference.pdf

Distributed Distillation for On-Device Learning