Notes for "Large Scale Distributed Deep Networks" paper

#artificialintelligence 

Downpour SGD (with Adagrad adaptive learning rate) outperforms Downpour SGD (with fixed learning rate) and Sandblaster L-BFGS. Moreover, Adagrad can be easily implemented locally within each parameter shard. It is surprising that Adagrad works so well with Downpour SGD as it was not originally designed to be used with asynchronous SGD. The paper conjectures that "Adagrad automatically stabilizes volatile parameters in the face of the flurry of asynchronous updates, and naturally adjusts learning rates to the demands of different layers in the deep network."

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found