LTD-Bench: Evaluating Large Language Models by Letting Them Draw Liuhao Lin

Open in new window