effective sample size
Appendices
Additionally, to avoid gradients with infinite means even if DL is not contractive, we consider a spectral normalisation, so that instead of computing recursively η0 = ε and ηk = DLηk 1 for k {1,...,N},weset η0 =εand The motivation was to have a quadratic increase for the penalty term if the largest absolute eigenvalue approaches 1, and then smoothly switch to a linear function for values larger than δ2. The suggested approach can perform poorly for non-convex potentials or even convex potentials such as arsing in a logistic regression model for some data sets. The idea now is to run HMC with unit mass matrix for the transformed variables z = f 1(q) where q π. Hessian-vector products can similarly be computed using vector-Jacobian products: With g(z) = grad( U,z), we then compute 2 U(z)w = vjp(g,z,w)> for z = f 1(stop grad(f(zbL/2c)). We also stop all U gradients, i.e.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Greece (0.04)
On Nonasymptotic Confidence Intervals for Treatment Effects in Randomized Experiments
Sandoval, Ricardo J., Balakrishnan, Sivaraman, Feller, Avi, Jordan, Michael I., Waudby-Smith, Ian
We study nonasymptotic (finite-sample) confidence intervals for treatment effects in randomized experiments. In the existing literature, the effective sample sizes of nonasymptotic confidence intervals tend to be looser than the corresponding central-limit-theorem-based confidence intervals by a factor depending on the square root of the propensity score. We show that this performance gap can be closed, designing nonasymptotic confidence intervals that have the same effective sample size as their asymptotic counterparts. Our approach involves systematic exploitation of negative dependence or variance adaptivity (or both). We also show that the nonasymptotic rates that we achieve are unimprovable in an information-theoretic sense.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
Architecture-Aware Generalization Bounds for Temporal Networks: Theory and Fair Comparison Methodology
Gahtan, Barak, Bronstein, Alex M.
Deep temporal architectures such as TCNs achieve strong predictive performance on sequential data, yet theoretical understanding of their generalization remains limited. We address this gap through three contributions: introducing an evaluation methodology for temporal models, revealing surprising empirical phenomena about temporal dependence, and the first architecture-aware theoretical framework for dependent sequences. Fair-Comparison Methodology. We introduce evaluation protocols that fix effective sample size $N_{\text{eff}}$ to isolate temporal structure effects from information content. Empirical Findings. Applying this method reveals that under $N_{\text{eff}} = 2000$, strongly dependent sequences ($ρ= 0.8$) exhibit approx' $76\%$ smaller generalization gaps than weakly dependent ones ($ρ= 0.2$), challenging the conventional view that dependence universally impedes learning. However, observed convergence rates ($N_{\text{eff}}^{-1.21}$ to $N_{\text{eff}}^{-0.89}$) significantly exceed theoretical worst-case predictions ($N^{-0.5}$), revealing that temporal architectures exploit problem structure in ways current theory does not capture. Lastly, we develop the first architecture-aware generalization bounds for deep temporal models on exponentially $β$-mixing sequences. By embedding Golowich et al.'s i.i.d. class bound within a novel blocking scheme that partitions $N$ samples into approx' $B \approx N/\log N$ quasi-independent blocks, we establish polynomial sample complexity under convex Lipschitz losses. The framework achieves $\sqrt{D}$ depth scaling alongside the product of layer-wise norms $R = \prod_{\ell=1}^{D} M^{(\ell)}$, avoiding exponential dependence. While these bounds are conservative, they prove learnability and identify architectural scaling laws, providing worst-case baselines that highlight where future theory must improve.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- South America > Peru > Tumbes Department (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > New Mexico (0.04)
- North America > United States > New York (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Robust Sampling for Active Statistical Inference
Li, Puheng, Zrnic, Tijana, Candès, Emmanuel
Collecting high-quality labeled data remains a challenge in data-driven research, especially when each label is costly and time-consuming to obtain. In response, many fields have embraced machine learning as a practical solution for predicting unobserved labels, such as annotating satellite imagery in remote sensing [46] and predicting protein structures in proteomics [24]. Prediction-powered inference [1] is a methodological framework showing how to perform valid statistical inference despite the inherent biases in such predicted labels. Active statistical inference [51] was recently introduced to further enhance inference by actively selecting which data points to label. The basic idea is to compute the model's uncertainty scores for all data points and prioritize collecting those labels for which the predictive model is most uncertain. When the uncertainty scores appropriately reflect the model's errors, Zrnic and Cand` es [51] show that active inference can significantly outperform prediction-powered inference (which can essentially be thought of as active inference with naive uniform sampling), meaning it results in more accurate estimates and narrower confidence intervals. However, when uncertainty scores are of poor quality, active inference can result in overly noisy estimates and large confidence intervals.
- Oceania > New Zealand (0.04)
- North America > United States > California (0.04)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)