SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs