layernorm
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
We demonstrate that plasticity loss is pervasive under domain shift in this regime, and that a number of methods developed to resolve it in other settings fail, sometimes even performing worse than applying no intervention at all. In contrast, we find that a class of "regenerative" methods are able to consistently mitigate plasticity loss in a variety of contexts, including in gridworld tasks and
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Health & Medicine (0.67)
- Education (0.67)
- North America > United States > Massachusetts (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- North America > United States > Texas > Travis County > Austin (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Arizona (0.04)
- (2 more...)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (4 more...)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Spain (0.04)
Mixer
Groupingthechannelstogether Token-mixingMLPstake S-dimensionalvectorsasinputs.Every such vector contains values of asingle feature acrossS different spatial locations. In other words, token-mixing MLPs operate by looking at onlyone channel at once. Forstochasticdepth,followingtheoriginal paper [3], we linearly increase the probability of dropping a layer from0.0 Models are fine-tuned at resolution 224 unless mentioned otherwise. We follow the setup of [2].
- Research Report (0.86)
- Overview (0.54)