Self-Training Vision Language BERTs with a Unified Conditional Model

Yang, Xiaofeng, Lv, Fengmao, Liu, Fayao, Lin, Guosheng

Jan-19-2023–arXiv.org Artificial Intelligence

Abstract--Natural language BERTs are trained with language corpus in a self-supervised manner. An example of generated image descriptions. Given different condition flags, our proposed UCM model is able to generate diverse image descriptions, such as COCO caption, dense caption, and questions. It's clear that the generated contents have different styles. Large scale pretraining has become the dominating approach in various natural language processing tasks. The success of large scale pretraining is due to a large amount of language setting. Although these models can be finetuned to perform training data available everywhere and the self-training algorithm. In this paper, we Second, current common practice in vision language BERT propose a self-training approach that allows to pretrain VL-pretraining uses various image descriptions to train, such as BERTs using unlabeled image data. Those image Self-training is usually done by iterating the following three descriptions have significant differences, making it difficult for steps: 1) training with labeled data, 2) generating pseudo labels an unconditional model to learn to generate adequate pseudo for unlabeled data, 3) mixing the labeled data and unlabeled captions for unlabeled images. However, the has shown its effectiveness in various tasks [4], [5], how to self-training of vision language BERTs is nontrivial due to use it effectively in training vision language BERTs is not yet the following reasons. First, although auto-encoding models studied.

caption, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Jan-19-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - South Australia > Adelaide (0.04)
- Asia
  - Singapore (0.05)
  - China > Sichuan Province
    - Chengdu (0.04)

Genre:
- Research Report (0.82)

Industry:
- Education (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Machine Translation (0.68)
  - Vision > Image Understanding (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found