Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens