Human-Instruction-Free LLM Self-Alignment with Limited Samples

Guo, Hongyi, Yao, Yuanshun, Shen, Wei, Wei, Jiaheng, Zhang, Xiaoying, Wang, Zhaoran, Liu, Yang

Jan-6-2024–arXiv.org Artificial Intelligence

Aligning large language models (LLMs) with human values is a vital task for LLM practitioners. Current alignment techniques have several limitations: (1) requiring a large amount of annotated data; (2) demanding heavy human involvement; (3) lacking a systematic mechanism to continuously improve. In this work, we study aligning LLMs to a new domain with limited samples (e.g. < 100). We propose an algorithm that can self-align LLMs iteratively without active human involvement. Unlike existing works, our algorithm relies on neither human-crafted instructions nor labeled rewards, significantly reducing human involvement. In addition, our algorithm can self-improve the alignment continuously. The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples. Then we use the self-generated samples to finetune the LLM iteratively. We show that our method can unlock the LLMs' self-generalization ability to perform alignment with near-zero human supervision. We test our algorithm on three benchmarks in safety, truthfulness, and instruction-following, and show good performance in alignment, domain adaptability, and scalability.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jan-6-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Cruz County > Santa Cruz (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Leisure & Entertainment (0.74)
- Media > Film (0.74)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)
  - Natural Language > Large Language Model (1.00)