Supplementary material for Discrete Valued Neural Communication in Structured Architectures Enhances Generalization

Neural Information Processing Systems 

In this appendix, as a complementary to Theorems 1-2, we provide additional theorems, Theorems 3-4, which further illustrate the two advantages of the discretization process by considering an abstract model with the discretization bottleneck. For the advantage on the sensitivity, the error due to potential noise and perturbation without discretization -- the third term ξ(w,r0,M0,d) >0 in Theorem 4 -- is shown to be minimized to zero with discretization in Theorems 3. See Appendix C.1 for a simple comparison between the bound of Theorem 3 and that of Theorem 4 when the metric spaces (M,d) and (M0,d0) are chosen to be Euclidean spaces. We now introduce the notation used in Theorems 3-4. Here, ϕw represents a deep neural network with weight parameters w W RD, qe is the discretization process with the codebook e E RL m, and hθ represents a deep neural network with parameters θ Θ Rζ. Thus, the tuple of all learnable parameters are (w,e,θ).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found