AITopics | Liu, Mushui

Collaborating Authors

Liu, Mushui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

Yang, Zhen, Shen, Guibao, Hou, Liang, Liu, Mushui, Wang, Luozhou, Tao, Xin, Wan, Pengfei, Zhang, Di, Chen, Ying-Cong

arXiv.org Artificial IntelligenceMar-14-2025

Diffusion models have achieved remarkable advances in various image generation tasks. However, their performance notably declines when generating images at resolutions higher than those used during the training period. Despite the existence of numerous methods for producing high-resolution images, they either suffer from inefficiency or are hindered by complex operations. In this paper, we propose RectifiedHR, an straightforward and efficient solution for training-free high-resolution image generation. Specifically, we introduce the noise refresh strategy, which theoretically only requires a few lines of code to unlock the model's high-resolution generation ability and improve efficiency. Additionally, we first observe the phenomenon of energy decay that may cause image blurriness during the high-resolution image generation process. To address this issue, we introduce average latent energy analysis and discover that an improved classifier-free guidance hyperparameter can significantly enhance generation performance. Our method is entirely training-free and boasts a simple implementation logic and efficient performance. Through extensive comparisons with numerous baseline methods, our RectifiedHR demonstrates superior effectiveness and efficiency.

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.02537

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation

Wang, Yi, Liu, Mushui, He, Wanggui, Zhang, Longxiang, Huang, Ziwei, Zhang, Guanghao, Shu, Fangxun, Tao, Zhong, She, Dong, Yu, Zhelun, Li, Haoyuan, Dai, Weilong, Song, Mingli, Song, Jie, Jiang, Hao

arXiv.org Artificial IntelligenceMar-3-2025

Unified generative models have demonstrated extraordinary performance in both text and image generation. However, they tend to underperform when generating intricate images with various interwoven conditions, which is hard to solely rely on straightforward text-to-image generation. In response to this challenge, we introduce MINT, an innovative unified generative model, empowered with native multimodal chain of thought (MCoT) for enhanced image generation for the first time. Firstly, we design Mixture of Transformer Experts (MTXpert), an expert-parallel structure that effectively supports both natural language generation (NLG) and visual capabilities, while avoiding potential modality conflicts that could hinder the full potential of each modality. Building on this, we propose an innovative MCoT training paradigm, a step-by-step approach to multimodal thinking, reasoning, and reflection specifically designed to enhance image generation. This paradigm equips MINT with nuanced, element-wise decoupled alignment and a comprehensive understanding of textual and visual components. Furthermore, it fosters advanced multimodal reasoning and self-reflection, enabling the construction of images that are firmly grounded in the logical relationships between these elements. Notably, MINT has been validated to exhibit superior performance across multiple benchmarks for text-to-image (T2I) and image-to-text (I2T) tasks.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.01298

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback