Rethinking GSPO: The Perplexity-Entropy Equivalence

Oct-28-2025–arXiv.org Artificial Intelligence

We provide a new perspective on GSPO's length-normalized importance ratios by establishing their connection to information-theoretic quantities. We show that GSPO's sequence-level weight $s(θ) = (π_θ/π_{θ_{\text{old}}})^{1/|y|}$ can be equivalently expressed as the inverse perplexity ratio $\text{PPL}_{θ_{\text{old}}}/\text{PPL}_θ$ and as the exponential cross-entropy change $\exp(ΔH)$. While the perplexity-entropy relationship follows from standard definitions, this observation provides a useful lens for understanding GSPO: the algorithm weights policy gradient updates by perplexity ratios, offering an information-theoretic interpretation of the importance weights. This perspective helps explain GSPO's empirical properties, including log-domain variance reduction through geometric averaging and stability in training mixture-of-experts models. We validate the mathematical equivalences and variance predictions through controlled experiments on mathematical reasoning tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.55)

Industry:
- Energy (0.61)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.68)
  - Natural Language > Large Language Model (0.46)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found