An ablation study over different model architectures (Table (a)) shows that the chosen