Self-SupervisedLearningbyCross-Modal Audio-VideoClustering
–Neural Information Processing Systems
The first challenge is the exorbitant cost of scaling up the size of manually-labeled video datasets. The recent creation of large-scale action recognition datasets [5,15,25,26]hasundoubtedly enabled amajor leap forwardinvideo models accuracies.
Neural Information Processing Systems
Feb-8-2026, 20:29:13 GMT
- Country:
- Asia > Russia (0.04)
- Europe > Russia (0.04)
- North America
- Industry:
- Leisure & Entertainment (0.46)
- Technology: