BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance