Characterizing how 'distributional' NLP corpora distance metrics are