Supplementary Document to the Paper " Efficient V ariational Inference for Sparse Deep Learning with Theoretical Guarantee "
–Neural Information Processing Systems
As a technical tool for the proof, we first restate the Lemma 6.1 in Chérief-Abdellatif and Alquier The first inequality is due to Lemma 1.1 and the second Under Condition 4.1 - 4.2, we have the following lemma that shows the existence of testing functions Now we define φ " max Note that log K " log N pε Hence we conclude the proof. We start with the first component. Pati et al. (2018), it could be shown ż 's, the third term in the RHS of (9) is bounded by 3 2nσ Similarly, the fifth term in the RHS of (9) is bounded by O p 1{n q. The convergence under squared Hellinger distance is directly result of Lemma 4.1 and 4.2, by As mentioned by Sønderby et al. (2016) and Molchanov et al. (2017), training sparse The optimization method used is Adam. The implementation details for UCI datasets and MNIST can be found in Section 2.5 and 2.6 In this section, we aim to demonstrate that there is little difference between the results using inverse-CDF reparameterization and Gumbel-softmax approximation via a toy example.
Neural Information Processing Systems
Dec-27-2025, 17:38:08 GMT
- Country:
- Europe
- Austria > Vienna (0.14)
- France > Hauts-de-France
- Spain (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- North America
- Canada (0.04)
- United States > Indiana
- Tippecanoe County
- Lafayette (0.04)
- West Lafayette (0.04)
- Tippecanoe County
- Europe
- Technology: