Goto

Collaborating Authors




Mesh-TensorFlow: Deep Learning for Supercomputers

Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

Neural Information Processing Systems

However,batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately,efficient model-parallel algorithms tend tobe complicated todiscover, describe, and to implement, particularly on large clusters.





7716d0fc31636914783865d34f6cdfd5-AuthorFeedback.pdf

Neural Information Processing Systems

This is becausea>t a takes a large amount of iterations to increase from negative to0.26 Consequently,withalargestepsize,wcanmovefarawayfromw beforea>t a becomesnonnegative. For problems with multiple global optima, our analysis can still be applied if the35 following condition holds: there exists one global optimum such that the PD condition holds globally with respect36 tothis optimum.


NearOptimalExploration-Exploitationin Non-CommunicatingMarkovDecisionProcesses

Neural Information Processing Systems

Reinforcement learning (RL) [1] studies the problem of learning in sequential decision-making problems where the dynamics of the environment is unknown, but can be learnt by performing actions andobserving their outcome inanonline fashion. Asample-efficient RLagent must trade off the explorationneeded to collect information about the environment, and theexploitation of the experience gathered so far to gain as much reward as possible.