Appendices A HSIC estimation in the self-supervised setting

Neural Information Processing Systems 

Estimators of HSIC typically assume i.i.d. A.3 Estimator of HSIC(Z, Z) Before discussing estimators of HSIC(Z,Z), note that it takes the following form: HSIC(Z,Z) = E null k (Z,Z Finally, note that even if null HSIC(Z,Z) is unbiased, its square root is not. B.1 InfoNCE connection To establish the connection with InfoNCE, define it in terms of expectations: L In the small variance regime, InfoNCE also bounds an HSIC-based loss. Both roots are real, as α 1 /4. Theorem B.1 works for any bounded kernel, because In Section 3.2, we make the assumption that the features are centered and argue that the assumption is valid for BYOL.