AT ask Setups Table 4: Shared hyperparameters for all models, given for each task

Neural Information Processing Systems 

Table 4: Shared hyperparameters for all models, given for each task. Hyperparameter Random Walk Algorithm Reddit/BASE Enwik8 Layers 4 4 8 8 Hidden size 256 256 512 512 Head count 4 4 8 8 Dropout rate 0.2 0.2 0.3 0.3 Embed. We provide the hyperparameter setups shared across our models for each task in Table 4. Random Walk We train 4-layer models with a hidden size of 256 and 4 attention heads. Algorithm We train the 4-layer model with a hidden size of 256 and 4 attention heads. Staircase model which was run 5 times.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found