A Distributional Perspective on Word Learning in Neural Language Models