AITopics | subject region

Collaborating Authors

subject region

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization

Gaur, Gopalji, Zolfaghari, Mohammadreza, Brox, Thomas

arXiv.org Artificial IntelligenceAug-7-2025

Generating a coherent sequence of images that tells a visual story, using text-to-image diffusion models, often faces the critical challenge of maintaining subject consistency across all story scenes. Existing approaches, which typically rely on fine-tuning or retraining models, are computationally expensive, time-consuming, and often interfere with the model's pre-existing capabilities. In this paper, we follow a training-free approach and propose an efficient consistent-subject-generation method. This approach works seamlessly with pre-trained diffusion models by introducing masked cross-image attention sharing to dynamically align subject features across a batch of images, and Regional Feature Harmonization to refine visually similar details for improved subject consistency. Experimental results demonstrate that our approach successfully generates visually consistent subjects across a variety of scenarios while maintaining the creative abilities of the diffusion model.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.03735

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.40)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.05)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Detail++: Training-Free Detail Enhancer for Text-to-Image Diffusion Models

Chen, Lifeng, Wang, Jiner, Pan, Zihao, Zhu, Beier, Yang, Xiaofeng, Zhang, Chi

arXiv.org Artificial IntelligenceJul-25-2025

Recent advances in text-to-image (T2I) generation have led to impressive visual results. However, these models still face significant challenges when handling complex prompt, particularly those involving multiple subjects with distinct attributes. Inspired by the human drawing process, which first outlines the composition and then incrementally adds details, we propose Detail++, a training-free framework that introduces a novel Progressive Detail Injection (PDI) strategy to address this limitation. Specifically, we decompose a complex prompt into a sequence of simplified sub-prompts, guiding the generation process in stages. This staged generation leverages the inherent layout-controlling capacity of self-attention to first ensure global composition, followed by precise refinement. To achieve accurate binding between attributes and corresponding subjects, we exploit cross-attention mechanisms and further introduce a Centroid Alignment Loss at test time to reduce binding noise and enhance attribute consistency. Extensive experiments on T2I-CompBench and a newly constructed style composition benchmark demonstrate that Detail++ significantly outperforms existing methods, particularly in scenarios involving multiple objects and complex stylistic conditions.

diffusion model, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.17853

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models

Feng, Kailai, Zhang, Yabo, Yu, Haodong, Ji, Zhilong, Bai, Jinfeng, Zhang, Hongzhi, Zuo, Wangmeng

arXiv.org Artificial IntelligenceNov-25-2024

Artistic typography is a technique to visualize the meaning of input character in an imaginable and readable manner. With powerful text-to-image diffusion models, existing methods directly design the overall geometry and texture of input character, making it challenging to ensure both creativity and legibility. In this paper, we introduce a dual-branch and training-free method, namely VitaGlyph, enabling flexible artistic typography along with controllable geometry change to maintain the readability. The key insight of VitaGlyph is to treat input character as a scene composed of Subject and Surrounding, followed by rendering them under varying degrees of geometry transformation. The subject flexibly expresses the essential concept of input character, while the surrounding enriches relevant background without altering the shape. Specifically, we implement VitaGlyph through a three-phase framework: (i) Knowledge Acquisition leverages large language models to design text descriptions of subject and surrounding. (ii) Regional decomposition detects the part that most matches the subject description and divides input glyph image into subject and surrounding regions. (iii) Typography Stylization firstly refines the structure of subject region via Semantic Typography, and then separately renders the textures of Subject and Surrounding regions through Controllable Compositional Generation. Experimental results demonstrate that VitaGlyph not only achieves better artistry and readability, but also manages to depict multiple customize concepts, facilitating more creative and pleasing artistic typography generation. Our code will be made publicly at https://github.com/Carlofkl/VitaGlyph.

input character, typography, vitaglyph, (15 more...)

arXiv.org Artificial Intelligence

2410.01738

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation (0.68)
Information Technology (0.46)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

Aesthetic Guideline Driven Photography by Robots

Gadde, Raghudeep (International Institute of Information Technology - Hyderabad) | Karlapalem, Kamalakar (International Institute of Information Technology - Hyderabad)

AAAI ConferencesJul-19-2011

Robots depend on captured images for perceiving the environment. A robot can replace a human in capturing quality photographs for publishing. In this paper, we employ an iterative photo capture by robots (by repositioning itself) to capture good quality photographs. Our image quality assessment approach is based on few high level features of the image combined with some of the aesthetic guidelines of professional photography. Our system can also be used in web image search applications to rank images. We test our quality assessment approach on a large and diversified dataset and our system is able to achieve a classification accuracy of 79%. We assess the aesthetic error in the captured image and estimate the change required in orientation of the robot to retake an aesthetically better photograph. Our experiments are conducted on NAO robot with no stereo vision. The results demonstrate that our system can be used to capture professional photographs which are in accord with the human professional photography.

photograph, robot, subject region, (14 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Indiana (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Add feedback