Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering

Open in new window