Self-supervised audio representation learning for mobile devices