Do better language models have crisper vision?

Open in new window