LTD-Bench: Evaluating Large Language Models by Letting Them Draw

Open in new window