Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Language Models