Review for NeurIPS paper: Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Feb-6-2025, 15:03:45 GMT–Neural Information Processing Systems

Additional Feedback: The motivating example could be explained more clearly. How exactly is the heuristic information incorporated into the search for a_dir? If a simulator is available, one typically wouldn't use a model-free algorithm like REINFORCE. A major benefit of REINFORCE is that it can do a Monte Carlo rollout and have an estimate of the direction to improve the policy without needing a simulator or a model of the environment. Once a simulator is added, it changes the structure of the problem such that different solution methods become available (i.e., MCTS).

direct policy gradient, discrete action space, simulator, (5 more...)

Neural Information Processing Systems

Feb-6-2025, 15:03:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.62)