Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Open in new window