Statistical Inference in Reinforcement Learning: A Selective Survey

Feb-22-2025–arXiv.org Machine Learning

Thus, the observed data can be summarized into a sequence of "observation-action-reward" triplets ( O t, A t, R t) t 0. It is worth noting that the observation O t at each time step is not equivalent to the environment's state S t. Indeed, the state can be viewed as a special observation with the Markov property, and we will elaborate on the difference between the two later. Policies: The goal of RL is to learn an optimal policy π based on the observation-action-reward triplets to maximize the agent's cumulative reward. Mathematically, a policy is defined as a conditional probability distribution function mapping the agent's observed data history to the action space. It specifies the probability of the agent taking different actions at each time step. Below, we introduce three types of policies (see Figure 1(b) for a visualization of their relationships): (1) History-dependent policy: This is the most general form of policy. At each time t, we define H t as the set containing the current observation O t and all prior historical information (O i, A i, R i) i

assumption, hypothesis, reinforcement learning, (13 more...)

arXiv.org Machine Learning

Feb-22-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County > Seattle (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Portugal > Porto
    - Porto (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Overview (1.00)
- Research Report
  - Experimental Study (0.47)
  - New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.34)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks (0.93)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found