GPTDrawer: Enhancing Visual Synthesis through ChatGPT
Li, Kun, Chen, Xinwei, Song, Tianyou, Zhang, Hansong, Zhang, Wenzhe, Shan, Qing
–arXiv.org Artificial Intelligence
In the burgeoning field of AI-driven image generation, the quest for precision and relevance in response to textual prompts remains paramount. This paper introduces GPTDrawer, an innovative pipeline that leverages the generative prowess of GPT-based models to enhance the visual synthesis process. Our methodology employs a novel algorithm that iteratively refines input prompts using keyword extraction, semantic analysis, and image-text congruence evaluation. By integrating ChatGPT for natural language processing and Stable Diffusion for image generation, GPTDrawer produces a batch of images that undergo successive refinement cycles, guided by cosine similarity metrics until a threshold of semantic alignment is attained. The results demonstrate a marked improvement in the fidelity of images generated in accordance with user-defined prompts, showcasing the system's ability to interpret and visualize complex semantic constructs. The implications of this work extend to various applications, from creative arts to design automation, setting a new benchmark for AI-assisted creative processes.
arXiv.org Artificial Intelligence
Dec-10-2024
- Country:
- North America > United States
- California (0.29)
- Illinois (0.29)
- North America > United States
- Genre:
- Research Report (0.84)
- Technology: