Text-InfusedAttentionandForeground-Aware ModelingforZero-ShotTemporalActionDetection
–Neural Information Processing Systems
Our simple approach results insuperior performance compared toprevious methods. Despite this improvement, we further identify a common-action bias issue that the cross-modal baseline over-focus on common sub-actions due to a lack of ability todiscriminate text-related visual parts.
Neural Information Processing Systems
Feb-8-2026, 04:45:53 GMT