A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models