Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited