Goto

Collaborating Authors

 nullc






43207fd5e34f87c48d584fc5c11befb8-Supplemental.pdf

Neural Information Processing Systems

Is Plug-in Solver Sample Efficient for Feature-based Reinfocement Learning? DMDP, so the optimal policy exists for player 1. For this policy, neither player can benefit from change its policy alone. We give the following well-known properties of 2-TBSG without proof (see. Here we prove the three arguments in Proposition 1. 1.


Supplementary Material: Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces A Estimability results

Neural Information Processing Systems

In A.1, we show consequence of Def. 1 which is used in the proofs We can apply Theorem??, to get C We show that our condition in Def. 1 and Pukelsheims and estimability This definition is sometimes used as restatement of the estimability property. Definition 4 (Projected data) . Lemma 2. The assumption in Definition 4 implies the assumption in Definition 1 with This section includes proofs for the concentration results presented in the main text. Z is as in Def. 2 where X The term above is so called self-normalized noise, which can be handled by techniques of de la Peña et al. (2009) popularized by Abbasi-Y adkori et al. (2011). From now on the proof is generic.


Supplementary Material: Memory-Efficient Approximation Algorithms for M

Neural Information Processing Systems

Upper bound on the objective. The proof consists of three parts. Upper bound on the objective. Upper bound on outer iteration complexity. Finally, we compute an upper bound on the complexity of each iteration, i.e., inner iteration complexity, of Algorithm 1. Upper bound on inner iteration complexity. We now derive an upper bound on N .



Sobolev norm inconsistency of kernel interpolation

Yang, Yunfei

arXiv.org Machine Learning

We study the consistency of minimum-norm interpolation in reproducing kernel Hilbert spaces corresponding to bounded kernels. Our main result give lower bounds for the generalization error of the kernel interpolation measured in a continuous scale of norms that interpolate between $L^2$ and the hypothesis space. These lower bounds imply that kernel interpolation is always inconsistent, when the smoothness index of the norm is larger than a constant that depends only on the embedding index of the hypothesis space and the decay rate of the eigenvalues.


Bagging in overparameterized learning: Risk characterization and risk monotonization

Patil, Pratik, Du, Jin-Hong, Kuchibhotla, Arun Kumar

arXiv.org Machine Learning

Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.