How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective

Open in new window