Semantic F1 Scores: Fair Evaluation Under Fuzzy Class Boundaries