Multi-dataset Training of Transformers for Robust Action Recognition