Algorithm-Relative Trajectory Valuation in Policy Gradient Control

Li, Shihao, Li, Jiachen, Xu, Jiamin, Martin, Christopher, Li, Wei, Chen, Dongmei

Nov-12-2025–arXiv.org Artificial Intelligence

We study how trajectory value depends on the learning algorithm in policy-gradient control. Using Trajectory Shapley in an uncertain LQR, we find a robust negative correlation between a trajectory's information content--Persistence of Excitation (PE)--and its marginal value under vanilla REINFORCE (e.g., r 0.38). We prove a variance-mediated mechanism: (i) for fixed energy, higher PE yields lower gradient variance; (ii) near saddle regions, higher variance increases the probability of escaping poor basins and thus raises marginal contribution. When the update is stabilized (state whitening or Fisher preconditioning), this variance channel is neutralized and information content dominates, flipping the correlation positive (e.g., r +0.29). Hence, trajectory value is algorithm-relative: it emerges from the interaction between data statistics and update dynamics. Experiments on LQR validate the two-step mechanism and the flip, and show that decision-aligned scores (Leave-One-Out) complement Shapley for pruning near the full set, while Shapley remains effective for identifying high-impact (and toxic) subsets.

machine learning, reinforcement learning, variance, (19 more...)

arXiv.org Artificial Intelligence

Nov-12-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.47)
  - Statistical Learning > Gradient Descent (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found