Goto

Collaborating Authors

 theorem 2


Semi-Supervised Learning on Graphs using Graph Neural Networks

Chen, Juntong, Donnat, Claire, Klopp, Olga, Schmidt-Hieber, Johannes

arXiv.org Machine Learning

Graph neural networks (GNNs) work remarkably well in semi-supervised node regression, yet a rigorous theory explaining when and why they succeed remains lacking. To address this gap, we study an aggregate-and-readout model that encompasses several common message passing architectures: node features are first propagated over the graph then mapped to responses via a nonlinear function. For least-squares estimation over GNNs with linear graph convolutions and a deep ReLU readout, we prove a sharp non-asymptotic risk bound that separates approximation, stochastic, and optimization errors. The bound makes explicit how performance scales with the fraction of labeled nodes and graph-induced dependence. Approximation guarantees are further derived for graph-smoothing followed by smooth nonlinear readouts, yielding convergence rates that recover classical nonparametric behavior under full supervision while characterizing performance when labels are scarce. Numerical experiments validate our theory, providing a systematic framework for understanding GNN performance and limitations.


RDP_Sampled_Shuffle

Deepesh Data

Neural Information Processing Systems

Let = argm 2 CF ( )denote (2). T ( )+ log ( 1/ )+( 1) log ( 1/ ) log ( ) 1 , (5) where ( )isthe RDPof 2. Convergence:IfwerunAcldpwith t = DGpt, whereG2 = max{d1 Substitutingtheboundon4 into Lemma 2 together manipulationgivesproves Theorem 1; see Appendix E.2 fordetails.





SparseDeepLearning: ANewFrameworkImmune toLocalTrapsandMiscalibration

Neural Information Processing Systems

Dn) 1 as n, which means the most posterior mass falls in the neighbourhood of true parameter. Remarkonthenotation: ν() is similar toν() defined in Section 2.1 of the main text. Thenotationsweusedinthis proof are the same as in the proof of Theorem 2.1. Theorem 2.2 implies that a faithful prediction interval can be constructed for the sparse neural network learned by the proposed algorithms. In practice, for a normal regression problem with noise N(0,σ2), to construct the prediction interval for a test pointx0, the terms σ2 and Σ = γ µ(β,x0)TH 1 γ µ(β,x0) in Theorem 2.2 need to be estimated from data.