BCE vs. CE in Deep Feature Learning