Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation