Goto

Collaborating Authors

 acpc


Aligned Contrastive Predictive Coding

arXiv.org Artificial Intelligence

We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss, to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than the sequence of upcoming representations to which they will be aligned. In this way, the prediction network solves a simpler task of predicting the next symbols, but not their exact timing, while the encoding network is trained to produce piece-wise constant latent codes. We evaluate the model on a speech coding task and demonstrate that the proposed Aligned Contrastive Predictive Figure 1: ACPC architecture. The encoder maps chunks of input Coding (ACPC) leads to higher linear phone prediction accuracy data into a latent space and the autoregressive model predicts and lower ABX error rates, while being slightly faster to K upcoming latent vectors. They are aligned using DTW to the train due to the reduced number of prediction heads.