Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay

Open in new window