Disentangling Voice and Content with Self-Supervision for Speaker Recognition (Appendix), Kong Aik Lee

Neural Information Processing Systems 

In this section, we will introduce the simplified method for implementing the proposed Gaussian inference. Similar to [9], we assume that the covariance (and precision) matrices are diagonal and choose to estimate directly the log-precision which turns out to be more convenient for following derivation. As the gain factor A is a diagonal matrix, and z and ϕ are vectors, the expensive matrix multiplication operations and numerically problematic matrix inversions are simplified into element-wise multiplication of diagonal elements and vectors. This is the same as the implementation of point-wise multiplication for matrices in neural networks and thus, is easy to implement based on existing toolkits. The method above can also be applied to layer 1 and layer 3 of the proposed RecXi.