Nonconvex Regularization for Feature Selection in Reinforcement Learning

Sep-22-2025–arXiv.org Artificial Intelligence

The primary objective of RL is for an agent to learn an optimal policy to control a system by minimizing a long-term loss, represented by the Q-function. This learning occurs through interactions with the environment, which is typically modeled as a Markov decision process (MDP). In most high-dimensional, real-world problems, explicitly representing the Q-function for all possible states and actions is impractical due to the "curse of dimensionality." A common solution is to approximate the Q-function using a parametric (functional) representation. This, however, introduces a fundamental trade-off between approximation accuracy and computational complexity: reducing the approximation error generally requires a large number of features in the parametric model, which in turn increases computational demands. Feature selection, achieved via a sparse representation over a large basis of functions, is an effective way to alleviate this tradeoff, mitigate overfitting, and improve sample efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Sep-22-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Japan (0.46)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found