Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study