Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets