pomdp
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.97)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Kingston (0.04)
- (2 more...)
- Overview (0.67)
- Research Report > New Finding (0.67)
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
sound (R1, R2), the experiments are appropriate and comprehensive (R2, R3, R4), the results are convincing (R1, R3, 2
R4), and the ablation studies are "tremendously useful" and helpful for making design choices (R1, R2, R3, R4). We'll update the paper to stress that our method is not equipped to solve the POMDP, e.g. It was not our intent to claim that it was. We'll remove the SOT A claims in light of the recent works CURL, RAD, and DrQ [1]. SLAC (ours) achieves comparable performance as DrQ (Kostrikov et al., 2020 [1]) in the 4 DM control tasks.
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- (2 more...)
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon. While such estimators exist for MDPs and POMDPs can be converted to history-based MDPs, their estimation errors depend on the state-density ratio for MDPs which becomes history ratios after conversion, an exponential object. Recently, Uehara et al. [2022a] proposed future-dependent value functions as a promising framework to address this issue, where the guarantee for memoryless policies depends on the density ratio over the latent state space. However, it also depends on the boundedness of the future-dependent value function and other related quantities, which we show could be exponential-in-length and thus erasing the advantage of the method. In this paper, we discover novel coverage assumptions tailored to the structure of POMDPs, such as outcome coverage and belief coverage, which enable polynomial bounds on the aforementioned quantities. As a side product, our analyses also lead to the discovery of new algorithms with complementary properties.