Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection

Ma, Yuhang, Xu, Wenting, Zhao, Chaoyi, Sun, Keqiang, Jin, Qinfeng, Zhao, Zeng, Fan, Changjie, Hu, Zhipeng

Sep-29-2024–arXiv.org Artificial Intelligence

Recent advances in text-to-image diffusion models have spurred significant interest in continuous story image generation. In this paper, we introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency, effective foreground-background separation, and diverse pose variation. The core innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector. The ID-Synchronizer employs an auto-mask self-attention module and a mask perceptual loss across inter-frame images to improve the consistency of character generation, vividly representing their postures and backgrounds. The ID-Injector utilize a Shuffling Reference Strategy (SRS) to integrate ID features into specific locations, enhancing ID-based consistent character generation. Additionally, to facilitate the training of Storynizor, we have curated a novel dataset called StoryDB comprising 100, 000 images. This dataset contains single and multiple-character sets in diverse environments, layouts, and gestures with detailed descriptions. Experimental results indicate that Storynizor demonstrates superior coherent story generation with high-fidelity character consistency, flexible postures, and vivid backgrounds compared to other character-specific methods.

artificial intelligence, machine learning, storynizor, (16 more...)

arXiv.org Artificial Intelligence

Sep-29-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Quebec (0.14)

Genre:
- Research Report (0.64)

Industry:
- Consumer Products & Services (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)