Towards the Next Frontier in Speech Representation Learning Using Disentanglement