PluralLLM: Pluralistic Alignment in LLMs via Federated Learning

Srewa, Mahmoud, Zhao, Tianyu, Elmalaki, Salma

Mar-12-2025–arXiv.org Artificial Intelligence

Ensuring Large Language Models (LLMs) align with diverse human preferences while preserving privacy and fairness remains a challenge. Existing methods, such as Reinforcement Learning from Human Feedback (RLHF), rely on centralized data collection, making them computationally expensive and privacy-invasive. We introduce PluralLLM a federated learning-based approach that enables multiple user groups to collaboratively train a transformer-based preference predictor without sharing sensitive data, which can also serve as a reward model for aligning LLMs. Our method leverages Federated Averaging (FedAvg) to aggregate preference updates efficiently, achieving 46% faster convergence, a 4% improvement in alignment scores, and nearly the same group fairness measure as in centralized training. Evaluated on a Q/A preference alignment task, PluralLLM demonstrates that federated preference learning offers a scalable and privacy-preserving alternative for aligning LLMs with diverse human values.

alignment, federated learning, pluralllm, (11 more...)

arXiv.org Artificial Intelligence

Mar-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Myanmar
  - Tanintharyi Region > Dawei (0.04)
- Europe
  - France (0.05)
  - Spain (0.05)
- North America > United States
  - California > Orange County
    - Irvine (0.05)
  - New York > New York County
    - New York City (0.04)

Genre:
- Research Report > New Finding (0.69)

Industry:
- Information Technology > Security & Privacy (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)