Goto

Collaborating Authors

 bkh


ProvablyEfficientReinforcementLearningwith LinearFunctionApproximationunderAdaptivity Constraints

Neural Information Processing Systems

Real-world reinforcement learning (RL) applications often come with possibly infinite state and action space, and in such a situation classical RL algorithms developed in the tabular setting are not applicable anymore. A popular approach to overcoming this issue is by applying function approximation techniques to the underlying structures of the Markovdecision processes (MDPs).




285baacbdf8fda1de94b19282acd23e2-Supplemental.pdf

Neural Information Processing Systems

Tabular RL: There is a long line of research on the sample complexity and regret for RL in tabular settings. In model-based settings, researchers have tackled continuous spaces via kernel methods, based on either a fixed discretization of the space [21], or more recently, without resorting to discretization [11]. While the latter does learn a data-driven representation of the space via kernels, it requires solving a complex optimization problem at each step, and hence is efficient mainly for finite action sets (more discussion on this is in Section 4). These were tested heuristically with various splitting rules (e.g. We use this result by chaining the Wasserstein distance of various measures together. Unfortunately, the scaling does not hold for the case whendS 2. In this situation we use the fact thatT The result from [46] has corresponding lower bounds, showing that in the worst case scaling with respect todS is inevitable.


AdaptiveDiscretizationforModel-Based ReinforcementLearning

Neural Information Processing Systems

Ouralgorithm isbasedonoptimistic one-stepvalueiteration extended to maintain an adaptive discretization of the space. From atheoretical perspective we provide worst-case regret bounds for our algorithm which are competitivecompared tothestate-of-the-art model-based algorithms.