Lean Formalization of Generalization Error Bound by Rademacher Complexity
Sonoda, Sho, Kasaura, Kazumi, Mizuno, Yuma, Tsukamoto, Kei, Onda, Naoto
–arXiv.org Artificial Intelligence
Naoto Onda 2, 5 naoto.onda@sinicx.com 1 RIKEN AIP 2 OMRON SINIC X Corporation 3 University College Dublin 4 The University of Tokyo 5 AutoRes Abstract We formalize the generalization error bound using Rademacher complexity in the Lean 4 theorem prover. Generalization error quantifies the gap between a learning machine's performance on given training data versus unseen test data, and Rademacher complexity serves as an estimate of this error based on the complexity of learning machines, or hypothesis class. Unlike traditional methods such as PAC learning and VC dimension, Rademacher complexity is applicable across diverse machine learning scenarios including deep learning and kernel methods. We formalize key concepts and theorems, including the empirical and population Rademacher complexities, and establish generalization error bounds through formal proofs of McDiarmid's inequality, Hoeffding's lemma, and symmetrization arguments. We make our code available on GitHub. 1 1 Introduction Generalization is a central concept in machine learning that describes how well a learning machine, constructed based on training data, can make predictions on unseen test data .
arXiv.org Artificial Intelligence
Mar-31-2025
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.24)
- Europe
- Germany (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Illinois (0.04)
- New York > New York County
- New York City (0.04)
- Asia > Japan
- Genre:
- Research Report (0.65)
- Technology: