Towards Benchmarking and Evaluating Deepfake Detection