Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric