LocCa: Visual Pretraining with Location-aware Captioners

Feb-18-2026, 06:40:27 GMT–Neural Information Processing Systems

Specifically, LocCa employs two tasks, bounding box prediction and location-dependent captioning, conditioned on the image pixel input.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Feb-18-2026, 06:40:27 GMT

Conferences PDF

Country:
- Europe
  - Switzerland > Zürich
    - Zürich (0.04)
  - Belgium > Flanders
    - Flemish Brabant > Leuven (0.04)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.67)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language > Large Language Model (0.69)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
LocCa: Visual Pretraining with Location-aware Captioners Bo Wan 1,3 Michael Tschannen 1 Y ongqin Xian

Similar Docs Excel Report more

Title	Similarity	Source
None found