Benignity of loss landscape with weight decay requires both large overparametrization and initialization

Open in new window