AnnihilationofSpuriousMinima inTwo-LayerReLUNetworks

Neural Information Processing Systems 

Evidencesuggests that this problem can be circumvented by the use of alarge number of parameters in DL models.