right hand side
Error Bounds for Learning with Vector-Valued Random Features
This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting. The parameter complexity (number of random features) and sample complexity (number of labeled data) required to achieve such rates are comparable with Monte Carlo intuition and free from logarithmic factors.
Supplement to " Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections "
For the moment, it is worth noting that such sets of functions (e.g., Haar wavelets, Daubechies wavelets) are readily We are now in a position to present the main theorem of this subsection. To avoid repetition, we defer further discussions on the rates observed in Theorem A.1 to Remark 2.7 where a holistic In fact, by Proposition 1.1, there exists an optimal transport map Based on (B.2), the natural plug-in estimator of ρ Suppose that the same assumptions from Theorem 2.2 hold. B.2 Nonparametric independence testing: Optimal transport based Hilbert-Schmidt independence criterion Proposition B.2 shows that the test based on Further, when the sampling distribution is fixed, Proposition B.2 shows that In the following result (see Appendix C.2 for a proof), we show that if This section is devoted to proving our main results and is organized as follows: In Appendix C.1, we Further by Lemma D.2, we also have: ϕ Note that (C.10) immediately yields the following conclusions: S By (1.5) and some simple algebra, the following holds: null null null S Combining the above display with (C.9), we further have: null null null null 1 2 W Combining the above observation with Theorem 2.1, we have: lim sup For the next part, to simplify notation, let us begin with some notation. By using the exponential Markov's inequality coupled with the standard union Now by using [7, Theorem 2.10], we have P (B We are now in a position to complete the proof of Theorem 2.2 using steps I-III. Therefore, it is now enough to bound the right hand side of (C.17).
Supplementary Information A The principle of least action and the Euler-Lagrange equation Here, we review the principle of least action and the derivation of the Euler-Lagrange equation [
Now, let us derive the differential equation that gives a solution to the variational problem. This condition yields the Euler-Lagrange equation, d dt @ L @ q = @ L @q . Here, we derive the Noether's learning dynamics by applying Noether's theorem to the A general form of the Noether's theorem relates the dynamics of Noether By evaluating the right hand side of Eq. 23, we get e Now, we harness the covariant property of the Lagrangian formulation, i.e., it preserves the form Plugging this expression obtained from the steady-state condition of Eq.27 Here, we ignore the inertia term in Eq. 16, assuming that the mass (learning rate) is finite but small All the experiments were run using the PyTorch code base. We used Tiny ImageNet dataset to generate all the empirical figures in this work. The key hyperparameters we used are listed with each figure.