Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels