A Technical Proofs min max E

Neural Information Processing Systems 

The distributionally robust tree structured prediction problem based on moment divergence in Eq. (1) can be rewritten as min E The objective function is convex in P and concave in Q because it is affine in both. The simplex constraints are omitted. Theorem 2. Given m samples, a non-negative loss l(,) such that |l(,)| K, a feature function ϕ(,) such that ϕ(,) B, a positive ambiguity level ε > 0, then, for any ρ (0, 1], with a probability at least 1 ρ, the following excess true worst-case risk bound holds: () ln(4/ρ) max As per the assumption, ϕ(,) B. This further implies that f(θ K ε i = 1, 2. ε We then follow the proof of Theorem 3 in Farnia and Tse [2016]. Our formulation differs from Nowak-Vila et al. [2020] in the fact that we allow probabilistic prediction to be ground truth. Proposition 4. Let G be a multi-graph.