UNIT: Unifying Image and Text Recognition in One Vision Encoder

Neural Information Processing Systems 

A straightforward solution is to finetune pre-trained ViTs using high-resolution documents.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found