Review for NeurIPS paper: Multi-task Batch Reinforcement Learning with Metric Learning

Jan-23-2025, 22:03:46 GMT–Neural Information Processing Systems

Weaknesses: The main weakness of the method is a reliance on accurate relabelling. The paper argues that actor-critic networks got casually confused due to (almost) disjoint task distributions and then hopes that reward models will not have the same problem. However, it seems that the problem also affects reward models, as a reward ensemble is used in the experiments. There is no ablation study to investigate the necessity of this ensemble in the offline setting. Can you explain why you did not use the setting from 5.1 and 5.2 to evaluate this component of your model?

ablation study, learning, multi-task batch reinforcement learning, (5 more...)

Neural Information Processing Systems

Jan-23-2025, 22:03:46 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)