Self-SupervisedLearningbyCross-Modal Audio-VideoClustering

Neural Information Processing Systems 

The first challenge is the exorbitant cost of scaling up the size of manually-labeled video datasets. The recent creation of large-scale action recognition datasets [5,15,25,26]hasundoubtedly enabled amajor leap forwardinvideo models accuracies.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found