Understanding Bias in Large-Scale Visual Datasets Zhuang Liu University of Pennsylvania UC Berkeley Meta FAIR

May-25-2025, 04:47:49 GMT–Neural Information Processing Systems

A recent study [40] has shown that large-scale visual datasets are very biased: they can be easily classified by modern neural networks. However, the concrete forms of bias among these datasets remain unclear. In this study, we propose a framework to identify the unique visual attributes distinguishing these datasets. Our approach applies various transformations to extract semantic, structural, boundary, color, and frequency information from datasets, and assess how much each type of information reflects their bias. We further decompose their semantic bias with object-level analysis, and leverage natural language methods to generate detailed, open-ended descriptions of each dataset's characteristics. Our work aims to help researchers understand the bias in existing large-scale pre-training datasets, and build more diverse and representative ones in the future.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

May-25-2025, 04:47:49 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Pennsylvania (0.40)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.88)

Industry:
- Education > Educational Setting
  - Higher Education (0.40)
- Health & Medicine (0.46)
- Information Technology (0.67)
- Leisure & Entertainment (0.67)
- Media (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)
    - Natural Language > Large Language Model (0.95)
    - Representation & Reasoning (1.00)
    - Vision > Image Understanding (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found