VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images Supplementary Materials

Neural Information Processing Systems 

Figure 5: t-SNE plots to illustrate the effectiveness of random sampling with the majority species in the Fish-10K dataset. Randomly sampled images are shown as blue dots, while the remaining data points are represented by red dots. To generate the vector representation of the images, we leverage a VGG19 pretrained on the ImageNet dataset. We collected images of three taxonomic groups of organisms: fish, birds, and butterflies, each containing around 10K images. Images for fish (Fish-10K) were curated from the larger image collection, FishAIR [1], which contains images from the Great Lakes Invasive Network Project (GLIN) [2]. We created the Fish-10K dataset by randomly sampling 10K images and preprocessing the images to crop and remove the background. To ensure diversity within the Fish-10K dataset, we applied a targeted sampling strategy in the source collection, FishAIR [1]. Specifically, we retained all images of species with fewer than 200 images, considering these as minority or rare classes. Random sampling was applied only to the majority species--those with more than 200 images per class. To assess the potential sampling bias among the majority species, we generated feature vectors for each image in Fish-10K using a pretrained VGG-19 model. Our analysis shows that the distribution of sampled images closely mirrors the distribution of images that were not included in the dataset (denoted as "others" in the plot). This suggests that our random sampling approach provides a sufficiently accurate representation of the original distribution for the majority species.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found