HOFAR: High-Order Augmentation of Flow Autoregressive Transformers

Liang, Yingyu, Sha, Zhizhou, Shi, Zhenmei, Song, Zhao, Wan, Mingda

Mar-11-2025–arXiv.org Artificial Intelligence

Several works have explored extending these models to generate images with an additional dimension, such as incorporating a temporal dimension for video generation [SPH + 22, LCW + 23] or a 3D spatial dimension for 3D object generation [XXMPM24, Mo24]. Even 4D generation [ZCW + 25, LYX + 24] has become feasible using diffusion models. Another prominent line of research focuses on auto-regressive models, where the Transformer framework has achieved groundbreaking success in natural language processing. Models such as GPT-4 [AAA + 23], Gemini 2 [Dee24], and DeepSeek [GYZ + 25] have significantly impacted millions of users worldwide. Given the success of the auto-regressive generation paradigm and the Transformer framework, recent works have explored integrating auto-regressive generation into image generation. A representative example is the Visual Auto-Regressive (VAR) model [TJY + 25], which introduces hierarchical image generation with different image patches.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-11-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin (0.14)

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)