Review for NeurIPS paper: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Neural Information Processing Systems 

Weaknesses: A weakness of this work is that from this work alone it's not clear why the proposed changes should work well for the problem domain. Moreover, why the interaction of the two proposed changes is so beneficial. While this is a problem in the body of work that effectively searches through the neural-network-architecture space, it would be very beneficial to try and focus on justifying more rigorously the design choices made. An example of how this could be done is designing a toy problem that exemplifies that pre-existing work cannot handle this case, and such, the proposed changes should be accepted. As a result of this, it's not clear how significant this work is/will be.