Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering

Open in new window