RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Zhang, Zilun, Zhao, Tiancheng, Guo, Yulong, Yin, Jianwei

Jan-2-2024–arXiv.org Artificial Intelligence

Pre-trained Vision-Language Models (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain pre-trained Vision-Language Model (DVLM), bridging the gap between the General Vision-Language Model (GVLM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we fine-tuned the CLIP model and tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DVLM. Experimental results show that our proposed dataset is highly effective for various tasks, and our model GeoRSCLIP improves upon the baseline or previous state-of-the-art model by $3\%\sim20\%$ in Zero-shot Classification (ZSC), $3\%\sim6\%$ in Remote Sensing Cross-Modal Text-Image Retrieval (RSCTIR) and $4\%\sim5\%$ in Semantic Localization (SeLo) tasks. Dataset and models have been released in: \url{https://github.com/om-ai-lab/RS5M}.

aerial view, dataset, satellite image, (14 more...)

arXiv.org Artificial Intelligence

Jan-2-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Pacific Ocean > North Pacific Ocean
  - San Francisco Bay (0.04)
  - Prince William Sound (0.04)
- Oceania
  - New Zealand (0.04)
  - Australia > New South Wales
    - Sydney (0.04)
- North America
  - Mexico (0.04)
  - Canada (0.04)
  - United States
    - Maryland > St. Mary's County (0.14)
    - Virginia (0.04)
    - Tennessee (0.04)
    - New York (0.04)
    - Alaska (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
- Europe
  - Russia (0.04)
  - Poland (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Middle East
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
    - Cyprus > Limassol
      - Limassol (0.04)
- Asia
  - Russia (0.04)
  - Indonesia (0.04)
  - Central Asia (0.04)
  - Middle East
    - Iraq (0.04)
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
  - Japan > Honshū
    - Tōhoku > Fukushima Prefecture
      - Fukushima (0.04)
    - Kansai > Osaka Prefecture
      - Osaka (0.04)
  - China > Zhejiang Province
    - Hangzhou (0.04)
- Africa
  - Madagascar (0.04)
  - Southern Africa (0.04)
  - Nigeria (0.04)
  - Mauritania (0.04)
  - South Africa > Gauteng
    - Johannesburg (0.04)
  - Middle East
    - Egypt > Cairo Governorate
      - Cairo (0.04)
    - Algeria > Adrar Province
      - Adrar (0.04)
  - Cabo Verde > Praia
    - Praia (0.04)

Genre:
- Research Report
  - New Finding (0.65)
  - Promising Solution (0.47)

Industry:
- Transportation
  - Infrastructure & Services (1.00)
  - Air (0.93)
  - Ground (0.67)
- Energy
  - Renewable > Geothermal
    - Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.84)
  - Power Industry > Utilities
    - Nuclear (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)