Markov Models
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
We introduce the first sample-efficient algorithm for LMDPs without any additional distributional assumptions . Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments.
Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces
However, training EBMs on data in discrete or mixed state spaces poses significant challenges due to the lack of robust and fast sampling methods. In this work, we propose to train discrete EBMs with Energy Discrepancy, a loss function which only requires the evaluation of the energy function at data points and their perturbed counterparts, thus eliminating the need for Markov chain Monte Carlo.