AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator