A Observations in Local Memory Similarity
–Neural Information Processing Systems
We observed local memory's similarity through Q-Q (quantile-quantile) plots as shown in Figure In Figure A1(a), the linearity of the points in Q-Q plot suggests that the worker 1's local This is consistent to our observations in pairwise cosine distance shown in Figure 2(a). This indicates that we can possibly use local worker's top-k One variant of Y oung's inequality is k x + y k A.1 global minimum of f ( x) 2, The quadrilateral identity is h x, y i = 1 2 k x k We provided the following table to explain section 3's main results and connected them to other parts of paper. Our theorem 1 shows this; indicates its applicability in distributed training. Lemma1: contraction property Lemma2: contraction in distributed setting Theorem1: ScaleCom's convergence rate same as SGD ( 1 / p T) Intuition Higher correlation between workers brings CL T - k closer to true top-k Require positive correlation between workers in distr. Fig.2 and 3 show high correlation so our contraction is close to true top-k Fig.2 and 3 show positive correlation between workers Table 1,2 (Fig4,5) verified ScaleCom's convergence same as baseline Each node is equipped with 2 IBM Power 9 processors clocked at 3.15 GHz.
Neural Information Processing Systems
Nov-14-2025, 18:03:08 GMT