Goto

Collaborating Authors

 Reinforcement Learning






e1696007be4eefb81b1a1d39ce48681b-Paper.pdf

Neural Information Processing Systems

In this work, we identify anovel set of conditions that ensure convergence with probability 1 ofQ-learning with linear function approximation, by proposing a twotime-scalevariationthereof.



ABi-LevelFrameworkforLearningtoSolve CombinatorialOptimizationonGraphs

Neural Information Processing Systems

However, achieving such an assumption is non-trivial, leading to the following two aspects of challenges. On the one hand, it is challenging to design a model with enough capacity with limited computational resources, andexisting models areusually tailored forspecific problems which require heavytrailand-error [25,57,59].



Finite-SampleAnalysisofOff-PolicyTD-Learningvia GeneralizedBellmanOperators

Neural Information Processing Systems

Itisknown that policyevaluation has the interpretation of solving ageneralized Bellman equation. Inthispaper,wederivefinite-sample bounds foranygeneral off-policy TD-like stochastic approximation algorithm that solves for the fixedpoint of this generalized Bellman operator.