Reviews: An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Oct-7-2024, 08:52:30 GMT–Neural Information Processing Systems

The paper formulates and proves a policy gradient theorem in the off-policy setting. The derivation is based on emphatic weighting of the states. Based on the introduced theorem, an actor-critic algorithm, termed ACE, is further proposed. The algorithm requires computing policy gradient updates that depend on the emphatic weights. Computing low-variance estimates of the weights is non-trivial, and the authors introduce a relaxed version of the weights that interpolate between the off-policy actor-critic (Degris et al., 2012) and the unbiased (but high variance) estimator; the introduced estimator can be computed incrementally.

emphatic weighting, off-policy policy gradient theorem, review, (5 more...)

Neural Information Processing Systems

Oct-7-2024, 08:52:30 GMT

Conferences Web Page

Add feedback

Genre:
- Summary/Review (0.61)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.39)