[P]Cannot replicate results of deepmind paper • r/MachineLearning

@machinelearnbot 

I've been trying for some time to replicate the results of the Bayesian Recurrent Neural Networks paper from Deepmind (https://arxiv.org/abs/1704.02798, the simple model without posterior sharpening), however my perplexity never reaches the result they have in the paper - 78.8 on validation, 75.5 on test set. I wanted to ask if anyone could spare a few minutes to take a look at my implementation and point out if anything is wrong. I've done an extensive hyperparameter search over various initialization schemes, learning rate, etc but haven't gotten close to their results. I'm using \pi 0.25, log \sigma_1 -1.0, log \sigma_2 -8.0 for the prior, and everything else should be like the paper states. Full implementation is here, and the exact file with the model is here.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found