Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients