LocCa: Visual Pretraining with Location-aware Captioners

Neural Information Processing Systems 

Specifically, LocCa employs two tasks, bounding box prediction and location-dependent captioning, conditioned on the image pixel input.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found