Large-scale gradient-based training of Mixtures of Factor Analyzers