Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

Open in new window