Toloka Visual Question Answering Benchmark

Ustalov, Dmitry, Pavlichenko, Nikita, Koshelev, Sergey, Likhobaba, Daniil, Smirnova, Alisa

Sep-28-2023–arXiv.org Artificial Intelligence

In this task, given an image and a textual question, one has to draw the bounding box around the object correctly responding to that question. Every image-question pair contains the response, with only one correct response per image. Our dataset contains 45,199 pairs of images and questions in English, provided with ground truth bounding boxes, split into train and two test subsets. Besides describing the dataset and releasing it under a CC BY license, we conducted a series of experiments on open source zero-shot baseline models and organized a multi-phase competition at WSDM Cup that attracted 48 participants worldwide. However, by the time of paper submission, no machine learning model outperformed the non-expert crowdsourcing baseline according to the intersection over union evaluation score.

annotator, competition, dataset, (12 more...)

arXiv.org Artificial Intelligence

Sep-28-2023

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - New York > New York County
      - New York City (0.04)
    - Nevada > Clark County
      - Las Vegas (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California
      - Los Angeles County > Long Beach (0.04)
      - Ventura County > Thousand Oaks (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Switzerland (0.04)
  - Serbia > Central Serbia
    - Belgrade (0.04)
- Asia > Middle East
  - Qatar > Ad-Dawhah > Doha (0.04)

Genre:
- Research Report (0.82)
- Overview (0.68)

Industry:
- Leisure & Entertainment > Sports (1.00)

Technology:
- Information Technology
  - Communications > Social Media
    - Crowdsourcing (0.36)
  - Artificial Intelligence
    - Vision (1.00)
    - Machine Learning > Neural Networks (0.46)
    - Natural Language
      - Large Language Model (0.49)
      - Question Answering (0.42)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found