Supplementary Materials for MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

Neural Information Processing Systems 

Especially, in addition to our well-designed information-rich prompt, we maintain the labels from original dataset such as "category", "spatial_resolution" (i.e., ground sample distance (GSD)), "cloud_cover" (i.e., levels of cloud cover), which could provide the user with more information for potential downstream tasks (e.g., classification and recognition). Below is a brief introduction of each dataset: MRSSC2.0 [3]: This dataset is constructed based on high-quality earth observation data obtained by the TianGong-2 Wideband Imaging Spectrometer and Interferometric Imaging Radar Altimeter. It is a cross-domain remote sensing scene classification dataset featuring four modes: Visible Near-Infrared (VIS), Short Wavelength Infrared (SWI), Thermal Infrared (INF), and Synthetic Aperture Radar (SAR). However, the images in different modalities are not aligned. It contains a total of 6155 images, each with an original pixel resolution of 256 256 and an original GSD of 100 m/pixel.