Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity