Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Open in new window