Appendix
–Neural Information Processing Systems
Weevaluated all models onthree additional tasks, beyond those presented inthe main paper. Point-of-no-return (PNR) temporal localization error:Given a video clip of a state change, the networkhastoestimate thetimeatwhich astatechange begins. More specifically,themodel tries toestimate the keyframe within the video clip that contains the point-of-no-return (the time when the state change begins). The occurrence ofstate change isthen predicted bytraining abinary linear classifier, using the concatenated representations as input. ActionRecognition(AR)w/audio:Forthistask,videoembeddings fromfV andaudioembedding from fA are concatenated together and passed through two separate linear classifiers to classify the'verb' and'noun' of the action occurring in the video clip.
Neural Information Processing Systems
Feb-10-2026, 21:38:47 GMT
- Technology: