Review for NeurIPS paper: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

May-31-2025, 17:08:22 GMT–Neural Information Processing Systems

Weaknesses: Soundness of the claims: While the general ideas seem somewhat clear, most of the proofs read more like sketches and some of the claims need to be verified by hand. For example in the proof of Lemma 9 the authors do not provide a derivation for the each of the elements of the Hessian but merely state that the result follows from a direct computation (which seems to be left as an exercise to the reader). Other examples where more details would not hurt are the proof of Lemma 13, where the authors bound \ q_t - \tilde \q_t\ in a self-bounding way, to achieve the upper bound but no mention of this is given other than the final result and the proof of Lemma 27, where derivations on lines 720-722 were not entirely straightforward. Overall the Appendix can benefit from more details in the proofs. Significance and novelty of contribution: The core ideas of this work are not novel.

algorithm, learning stochastic, stochastic and adversarial episodic mdp, (10 more...)

Neural Information Processing Systems

May-31-2025, 17:08:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)