Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?