SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs Supplementary Materials Appendix Overview
–Neural Information Processing Systems
Appendix B provides additional implementation details, including a video SP AE variant. Appendix C includes more quantitative evaluation results. Appendix D shows more qualitative examples of model generations. Figure 1 shows an example of the dilation subsampler defined by Eq. (1). We select evenly distributed positions in each layer to form the token pyramid with monotonically increasing layer sizes.
Neural Information Processing Systems
Oct-9-2025, 03:43:18 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.66)
- Natural Language > Large Language Model (0.55)
- Vision (1.00)
- Information Technology > Artificial Intelligence