SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs Supplementary Materials Appendix Overview

Neural Information Processing Systems 

Appendix B provides additional implementation details, including a video SP AE variant. Appendix C includes more quantitative evaluation results. Appendix D shows more qualitative examples of model generations. Figure 1 shows an example of the dilation subsampler defined by Eq. (1). We select evenly distributed positions in each layer to form the token pyramid with monotonically increasing layer sizes.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found