Goto

Collaborating Authors

 Statistical Learning



1102a326d5f7c9e04fc3c89d0ede88c9-Supplemental.pdf

Neural Information Processing Systems

This is the distribution over datasets one obtains by first sampling a task t from Pt, and then sampling a dataset S from Pmz|t. Here p(S) corresponds to the marginal distribution over datasets S. Note that the last line above holds because E P f(,S) does not depend on t. Thus, in this section, we present a specialization of the bound for Gaussian distributions. Let P have mean ยต and covariance; thus P = N(ยต,) and analogously P,0 = N(ยต0, 0). We can then apply the analytical form for the KL-divergence between two multivariate Gaussian distributions to the bound presented in Theorem 3. The result is the following bound holding under the same assumptions as Theorem 3: L(P,Pt) 1 l We implement the above bound in code instead of the non-specialized form of the KL divergence to speed up computations and simplify gradient computations. A.3.2 Few-Shot Learning Bound with Validation Data In this section, we will assume that, in addition to the training data S Pmz|t, we have access to validation data Sva Pnz|t at meta-training time. We will show that a meta-learning generalization bound can still be obtained in this case.


Generalization Bounds for Meta-Learning via PAC-Bayes and Uniform Stability

Neural Information Processing Systems

We are motivated by the problem of providing strong generalization guarantees in the context of meta-learning. Existing generalization bounds are either challenging to evaluate or provide vacuous guarantees in even relatively simple settings. We derive a probably approximately correct (PAC) bound for gradient-based metalearning using two different generalization frameworks in order to deal with the qualitatively different challenges of generalization at the "base" and "meta" levels. We employ bounds for uniformly stable algorithms at the base level and bounds from the PAC-Bayes framework at the meta level. The result of this approach is a novel PAC bound that is tighter when the base learner adapts quickly, which is precisely the goal of meta-learning. We show that our bound provides a tighter guarantee than other bounds on a toy non-convex problem on the unit sphere and a text-based classification example. We also present a practical regularization scheme motivated by the bound in settings where the bound is loose and demonstrate improved performance over baseline techniques.






Bellman Residual Orthogonalization for Offline Reinforcement Learning Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

We propose and analyze a reinforcement learning principle that approximates the1 Bellman equations by enforcing their validity only along an user-defined space of2 test functions. Focusing on applications to model-free offline RL with function3 approximation, we exploit this principle to derive confidence intervals for off-policy4 evaluation, as well as to optimize over policies within a prescribed policy class.5 We prove an oracle inequality on our policy optimization procedure in terms of6 a trade-off between the value and uncertainty of an arbitrary comparator policy.7 Different choices of test function spaces allow us to tackle different problems8 within a common framework. We characterize the loss of efficiency in moving9 from on-policy to off-policy data using our procedures, and establish connections10 to concentrability coefficients studied in past work. We examine in depth the11 implementation of our methods with linear function approximation, and provide12 theoretical guarantees with polynomial-time implementations even when Bellman13 closure does not hold.14


Debiased Machine Learning without Sample-Splitting for Stable Estimators

Neural Information Processing Systems

Estimation and inference on causal parameters is typically reduced to a generalized method of moments problem, which involves auxiliary functions that correspond to solutions to a regression or classification problem. Recent line of work on debiased machine learning shows how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-n consistency of the target parameter of interest, while only requiring mean-squared-error guarantees from the auxiliary estimation algorithms. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes. For instance, we show that the stability properties that we propose are satisfied for ensemble bagged estimators, built via sub-sampling without replacement, a popular technique in machine learning practice.