Reviews: Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Neural Information Processing Systems 

Strengths of the paper are listed as follows: S1. The paper tackles the important problem of scene de-biasing for action recognition. It is of high concern for computer vision community to sanity check whether the proposed models (really) learn the dynamics of actions, and not just learn to leverage spurious bias such as the co-occurrence of the scene between actions. The authors develop a sensible solution, forcing the model to consider the human region for recognition, trying to reduce the sensitivity of action representation to the surrounding context. This is achieved by borrowing ideas from adversarial learning, that is, the scene recognition ability of action code is altered by directly using gradient reversal [8], a well-known domain confusion method in the literature since 2015.