Regularized Off-Policy TD-Learning

Mar-14-2024, 05:01:48 GMT–Neural Information Processing Systems

The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.

algorithm, objective function, ro-td algorithm, (14 more...)

Neural Information Processing Systems

Mar-14-2024, 05:01:48 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Wisconsin > Dane County
    - Madison (0.04)
  - Massachusetts > Hampshire County
    - Amherst (0.04)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)