Large Language Model as Attributed Training Data Generator: A T ale of Diversity and Bias Yue Y u
–Neural Information Processing Systems
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g.,
Neural Information Processing Systems
Oct-9-2025, 04:43:35 GMT
- Country:
- Africa (0.04)
- Asia
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Middle East > Iran
- Arabian Gulf (0.04)
- Japan > Honshū
- Europe > Germany (0.04)
- North America
- Mexico (0.04)
- United States
- Colorado (0.04)
- District of Columbia > Washington (0.04)
- Illinois (0.04)
- Kansas
- Kearny County (0.04)
- Rice County (0.04)
- Oceania > New Zealand (0.04)
- South America (0.04)
- Genre:
- Personal (0.67)
- Research Report > New Finding (0.92)
- Industry:
- Consumer Products & Services > Restaurants (0.68)
- Media
- Education (1.00)
- Banking & Finance > Economy (1.00)
- Government
- Health & Medicine
- Consumer Health (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area (1.00)
- Law (1.00)
- Information Technology
- Security & Privacy (1.00)
- Services (0.67)
- Energy > Renewable (0.67)
- Leisure & Entertainment
- Games > Computer Games (1.00)
- Sports > Baseball (0.67)
- Social Sector (0.67)
- Technology: