vqSGD: Vector Quantized Stochastic Gradient Descent

Gandikota, Venkata, Maity, Raj Kumar, Mazumdar, Arya

arXiv.org Machine Learning 

For any c R d, and r 0, let B d(c, r) denote a d -dimensional null 2 ball of radius r centered at c . Let e i R d denote the i -th standard basis vector which has 1 in the i -th position and 0 everywhere else. Also, let 1 d and 0 d denote the all 1's vector and all 0's vector in R d respectively. By [n ] we denote the set { 1, 2,..., n } . For a discrete set of points C R d, let conv (C) denote the convex hull of points in C, i.e.,, conv(C): null null c C a cc a c 0, null c C a c 1 null . Suppose w R d be the parameters of a function to be learned (such as weights of a neural network). In each step of the SGD algorithm, the parameters are updated as w w η ˆ д, where η is a possibly time-varying learning rate and ˆ д is a stochastic unbiased estimate of д, the true gradient of some loss function with respect to w .

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found