Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning

Open in new window