Does a sparse ReLU network training problem always admit an optimum ?

Open in new window