Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias Yue Yu
–Neural Information Processing Systems
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance.
Neural Information Processing Systems
May-25-2025, 08:47:17 GMT
- Country:
- Asia (0.67)
- North America > United States (1.00)
- Genre:
- Personal (0.67)
- Research Report > New Finding (0.92)
- Industry:
- Consumer Products & Services > Restaurants (0.68)
- Media
- Film (1.00)
- News (0.94)
- Television (0.67)
- Education (1.00)
- Banking & Finance > Economy (1.00)
- Government
- Health & Medicine
- Consumer Health (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area (1.00)
- Law (1.00)
- Information Technology
- Security & Privacy (1.00)
- Services (0.67)
- Energy > Renewable (0.67)
- Leisure & Entertainment
- Games > Computer Games (1.00)
- Sports > Baseball (0.67)
- Social Sector (0.67)
- Technology: