How to Benchmark Vision Foundation Models for Semantic Segmentation?