Does a sparse ReLU network training problem always admit an optimum ?