R1/R3: Running time and practicality of ApproPO: In our experiments, we implement an RL oracle by a policy-2

Oct-3-2025, 03:53:02 GMT–Neural Information Processing Systems

We thank the reviewers for their constructive comments. We address the main concerns below. In our implementation, it was crucial to use the improvements from Sec. 3.4. We ran the "positive response" version of Note that the policy mixture returned by ApproPO is just a weighted combination of the policies from cache. We will add this discussion to the paper and also update plots, so they are in terms of transitions rather than trajectories.

appropo, constraint, oracle, (14 more...)

Neural Information Processing Systems

Oct-3-2025, 03:53:02 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > New Finding (0.41)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.34)

Duplicate Docs Excel Report

Title
873be0705c80679f2c71fbf4d872df59-AuthorFeedback.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found