VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models