Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?