Audio Representation Learning by Distilling Video as Privileged Information