Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores