Transformers for molecular property prediction: Lessons learned from the past five years