Appendix A Additional downstream evaluation tasks
–Neural Information Processing Systems
We evaluated all models on three additional tasks, beyond those presented in the main paper. This is done by training a fully-connected head applied to each frame's The occurrence of state change is then predicted by training a binary linear classifier, using the concatenated representations as input. Discussion of Results The results on the additional downstream tasks are shown in Tab. For this experiment, we first categorize the activities based on the nature of the transition: 1 T1: irreversible interactions, backward transition highly unlikely (e.g., cut vegetables) T2: reversible interactions, backward transition occurs often (e.g., open/close fridge) T3: interactions with no transition direction (e.g., stirring). As expected, RepLAI learns better associations between the audio and visual state changes than A VID.
Neural Information Processing Systems
Aug-17-2025, 03:14:10 GMT
- Technology: