Goto

Collaborating Authors

 corollary


Fast Rates for Inverse Reinforcement Learning

arXiv.org Machine Learning

We establish novel structural and statistical results for entropy-regularized min-max inverse reinforcement learning (Min-Max-IRL) with linear reward classes in finite-horizon MDPs with Borel state and action spaces. On the structural side, we show that maximum likelihood estimation (MLE) and Min-Max-IRL are equivalent at the population level, and at the empirical level under deterministic dynamics. On the statistical side, exploiting pseudo-self-concordance of the Min-Max-IRL loss, we prove that both the trajectory-level KL divergence and the squared parameter error in the Hessian norm decay at the fast rate $\mathcal{O}(n^{-1})$, where $n$ is the number of expert trajectories. Our guarantees apply under misspecification and require no exploration assumptions. We further extend reward-identifiability results to general Borel spaces and derive novel results on the derivatives of the soft-optimal value function with respect to reward parameters.


Supplementary material for " Regret Bounds for Multilabel Classification in Sparse Label Regimes "

Neural Information Processing Systems

This appendix contains all proofs of the results mentioned in the main body of the paper, plus further results which have been omitted there due to space limits. We recall the following lemma which upper bounds the probability measure of the ball around a point x X that contains its kth nearest neighbors. The proof immediately follows from the multiplicative Chernoff bound (see, e.g., Lemma 3.2 in [28]). When combined with Assumption 5.1 we obtain the following corollary. Corollary A.2. Suppose that the measure-smoothness assumption (Assumption 5.1) holds with parameters λ, Cλ, k k.






Appendices ABernoulli-CRSProperties

Neural Information Processing Systems

Let us defineK Rn n a random diagonal sampling matrix whereKj,j Bernoulli(pj) for 1 j n. Therefore, Bernoulli-CRS will perform on average the same amount of computations as in the fixed-rankCRS. This formulation immediately hints atthe possibility tosample over the input channeldimension, similarly to sampling column-row pairs in matrices. Let ` be a β-Lipschitz loss function, and let the network be trained with SGD using properly decreasing learning rate. Let us denote the weight, bias and activation gradients with respect to a loss function` by Wl, bl, al respectively.


No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling Supplementary Appendix May 10, 2023 A. Omitted Proofs

Neural Information Processing Systems

We now prove the first bullet. This is a contradiction, so in fact c κ. The first claim of the second bullet is analogous. To do so, we note the following technical lemma (proof below). To prove (#1), we proceed by induction on t.


AppendixOutline

Neural Information Processing Systems

Hence, we rely on subgradients defined in Equation 7. Since, many subgradient directions exist for the margin points, for consistency, we stick with xlγ(w;(x,y)) = {0}wheny w,x = γ. Note, that thesetofpoints inX satisfying this equality isazeromeasure set. For simplicity we shall treat the projection operation as just renormalizing w(t+1) to have unit norm,i.e., w(t+1) 2 = 1, t 0. This is not necessarily restrictive. A.1 TechnicalLemmas In this section we shall state some technical lemmas without proof, with references to works that contain the full proof. We shall use these in the following sections when proving our lemmas in Section5.