Fine-grained Video-Text Retrieval: A New Benchmark and Method