Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

Neural Information Processing Systems 

We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals

Duplicate Docs Excel Report

Title
Multi

Similar Docs  Excel Report  more

TitleSimilaritySource
None found