Image2Struct: Benchmarking Structure Extraction for Vision-Language Models

Neural Information Processing Systems 

We introduce Image2Struct, a benchmark to evaluate vision-language models (VLMs) on extracting structure from images.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found