SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs Supplementary Materials Appendix Overview