Non-Rectangular Robust MDPs with Normed Uncertainty Sets

Neural Information Processing Systems 

Robust policy evaluation for non-rectangular uncertainty set is generally NP-hard, even in approximation. Consequently, existing approaches suffer from either exponential iteration complexity or significant accuracy gaps. Interestingly, we identify a powerful class of Lp-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many sa-rectangular Lp-bounded sets and leverage its structural properties to derive a novel dual formulation for Lp robust Markov Decision Processes (MDPs). This formulation reveals key insights into the adversary's strategy and leads to the first polynomial-time robust policy evaluation algorithm for L1-normed non-rectangular robust MDPs.