Exploring Ordinal Bias in Action Recognition for Instructional Videos

Kim, Joochan, Jung, Minjoon, Zhang, Byoung-Tak

Dec-8-2025–arXiv.org Artificial Intelligence

Action recognition models have achieved promising results in understanding instructional videos. However, they often rely on dominant, dataset-specific action sequences rather than true video comprehension, a problem that we define as ordinal bias. To address this issue, we propose two effective video manipulation methods: Action Masking, which masks frames of frequently co-occurring actions, and Sequence Shuffling, which randomizes the order of action segments. Through comprehensive experiments, we demonstrate that current models exhibit significant performance drops when confronted with nonstandard action sequences, underscoring their vulnerability to ordinal bias. Our findings emphasize the importance of rethinking evaluation strategies and developing models capable of generalizing beyond fixed action patterns in diverse instructional videos. Due to the dominant action pair'Take-Background', the model fails to predict the action'Open.' Action recognition in instructional videos has witnessed remarkable progress, primarily driven by models that excel in curated benchmark datasets (Farha & Gall, 2019; Ishikawa et al., 2021; Li et al., 2020; Yi et al., 2021).

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Dec-8-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.84)
- Instructional Material > Course Syllabus & Notes (0.66)

Industry:
- Education > Educational Technology
  - Media (1.00)
  - Audio & Video (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found