FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning