Nonconvex Regularization for Feature Selection in Reinforcement Learning

Suzuki, Kyohei, Slavakis, Konstantinos

arXiv.org Artificial Intelligence 

The primary objective of RL is for an agent to learn an optimal policy to control a system by minimizing a long-term loss, represented by the Q-function. This learning occurs through interactions with the environment, which is typically modeled as a Markov decision process (MDP). In most high-dimensional, real-world problems, explicitly representing the Q-function for all possible states and actions is impractical due to the "curse of dimensionality." A common solution is to approximate the Q-function using a parametric (functional) representation. This, however, introduces a fundamental trade-off between approximation accuracy and computational complexity: reducing the approximation error generally requires a large number of features in the parametric model, which in turn increases computational demands. Feature selection, achieved via a sparse representation over a large basis of functions, is an effective way to alleviate this tradeoff, mitigate overfitting, and improve sample efficiency.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found