Can Large Language Models Robustly Perform Natural Language Inference for Japanese Comparatives?