We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers.
These methods can be broadly categorized into two types: function learning and operator learning approaches. In function learning, the goal is to directly learn the solution.
Effective communication between the server and workers plays a key role in distributed optimization. In this paper, we focus on optimizing communication, uncovering inefficiencies in prevalent downlink compression approaches.
(Theorem 4). We also consider Assumption 3. Note that this assumption does not restrict the class of considered Iterative algorithms are traditionally evaluated based on their gradient complexity .