Re-evaluating Theory of Mind evaluation in large language models