Value Pursuit Iteration

Feb-14-2020, 22:43:32 GMT–Neural Information Processing Systems

Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning and planning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrelevant features. Second, after each iteration of VPI, the algorithm adds a set of functions based on the currently learned value function to the dictionary. This increases the representation power of the dictionary in a way that is directly relevant to the goal of having a good approximation of the optimal value function.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Feb-14-2020, 22:43:32 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.67)
  - Machine Learning > Reinforcement Learning (0.31)