exp null 1 2
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > Australia > Western Australia (0.04)
- (2 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
A Proofs of Linear Case Throughout the appendix, for ease of notation, we overload the definition of the function d
The proof of this lemma requires Lemma A.1, which characterizes the distribution of the residual By Pinsker's inequality, this implies d By Lemma A.1, we have E[ X ( null w w The proof is inspired by Theorem 11.2 in [20], with modifications to our setting. First, we construct a "ghost" dataset The most challenging aspect of the ReLU setting is that we do not have an expression for the TV suffered by the MLE, such as Lemma 4.2 in the linear case. The proof of this Lemma, as well as other Lemmas in this section, can be found in Appendix B.1. Using Lemma B.2 and Lemma B.3, we can form a uniform bound, such that all A straight forward combination of Lemma 4.3 and Lemma B.4 gives the following Theorem. Now we can apply Bernstein's inequality (Theorem 2.10 of [8]).
Appendix A Theory
In this section, we show the proofs of the results in the main body. Eq. (1) satisfies the triangle inequality, i.e., for any scoring functions For the second inequality, we prove it similarly. Before we present the proof of the theorem, we first provide some lemmas. By applying Lemma A.2, the following holds with probability at least 1 α: null R F). Thus we have: null R A.1, we can get that the margin loss satisfies the triangle inequality. By Lemma A.4, we have R By Theorem 4.4, the following holds for any Based on Theorem A.6, the following standard error bound for gradual AST can be derived similarly to Corollary 4.6.
High-Probability Minimax Adaptive Estimation in Besov Spaces via Online-to-Batch
Liautaud, Paul, Gaillard, Pierre, Wintenberger, Olivier
We study nonparametric regression over Besov spaces from noisy observations under sub-exponential noise, aiming to achieve minimax-optimal guarantees on the integrated squared error that hold with high probability and adapt to the unknown noise level. To this end, we propose a wavelet-based online learning algorithm that dynamically adjusts to the observed gradient noise by adaptively clipping it at an appropriate level, eliminating the need to tune parameters such as the noise variance or gradient bounds. As a by-product of our analysis, we derive high-probability adaptive regret bounds that scale with the $\ell_1$-norm of the competitor. Finally, in the batch statistical setting, we obtain adaptive and minimax-optimal estimation rates for Besov spaces via a refined online-to-batch conversion. This approach carefully exploits the structure of the squared loss in combination with self-normalized concentration inequalities.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (4 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)