See then Tell: Enhancing Key Information Extraction with Vision Grounding

Open in new window