Audio Representation Learning by Distilling Video as Privileged Information

Open in new window