PolicyBoost: Functional Policy Gradient with Ranking-based Reward Objective

Yu, Yang (Nanjing University) | Da, Qing (Nanjing University)

AAAI Conferences 

Learning policies in nonlinear representations is an important step toward real-world applications of reinforcement learning in robotics. While functional representation has been widely applied in state-of-the-art supervised learning techniques (as known as boosting approaches) to adaptively learn nonlinear functions, in reinforcement learning the boosting-style approaches have been little investigated. Only a few pieces of work explored in this direction, which however may suffer from the occurring-probability-pursuing problem. In this paper, to alleviate the problem, we propose to employ a ranking-based objective function to guide the policy search in a function space, resulting in the PolicyBoost approach. Experiment results verify the effectiveness as well as the robustness of the PolicyBoost.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found