Evaluating Language Models' Evaluations of Games