CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models