FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering

Open in new window