Shrinking massive neural networks used to model language