AITopics | pororo

Collaborating Authors

pororo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TemporalStory: Enhancing Consistency in Story Visualization using Spatial-Temporal Attention

Zheng, Sixiao, Fu, Yanwei

arXiv.org Artificial IntelligenceJul-13-2024

Story visualization presents a challenging task in text-to-image generation, requiring not only the rendering of visual details from text prompt but also ensuring consistency across images. Recently, most approaches address inconsistency problem using an auto-regressive manner conditioned on previous image-sentence pairs. However, they overlook the fact that story context is dispersed across all sentences. The auto-regressive approach fails to encode information from susequent image-sentence pairs, thus unable to capture the entirety of the story context. To address this, we introduce TemporalStory, leveraging Spatial-Temporal attention to model complex spatial and temporal dependencies in images, enabling the generation of coherent images based on a given storyline. In order to better understand the storyline context, we introduce a text adapter capable of integrating information from other sentences into the embedding of the current sentence. Additionally, to utilize scene changes between story images as guidance for the model, we propose the StoryFlow Adapter to measure the degree of change between images. Through extensive experiments on two popular benchmarks, PororoSV and FlintstonesSV, our TemporalStory outperforms the previous state-of-the-art in both story visualization and story continuation tasks.

diffusion model, pororo, temporalstory, (13 more...)

arXiv.org Artificial Intelligence

2407.09774

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Masked Generative Story Transformer with Character Guidance and Caption Augmentation

Papadimitriou, Christos, Filandrianos, Giorgos, Lymperaiou, Maria, Stamou, Giorgos

arXiv.org Artificial IntelligenceMar-13-2024

Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences. Previous approaches either employ some kind of memory mechanism to maintain context throughout an auto-regressive generation of the image sequence, or model the generation of the characters and their background separately, to improve the rendering of characters. On the contrary, we embrace a completely parallel transformer-based approach, exclusively relying on Cross-Attention with past and future captions to achieve consistency. Additionally, we propose a Character Guidance technique to focus on the generation of characters in an implicit manner, by forming a combination of text-conditional and character-conditional logits in the logit space. We also employ a caption-augmentation technique, carried out by a Large Language Model (LLM), to enhance the robustness of our approach. The combination of these methods culminates into state-of-the-art (SOTA) results over various metrics in the most prominent SV benchmark (Pororo-SV), attained with constraint resources while achieving superior computational complexity compared to previous arts. The validity of our quantitative results is supported by a human survey.

caption, eddy, pororo, (17 more...)

arXiv.org Artificial Intelligence

2403.08502

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Story Visualization by Online Text Augmentation with Context Memory

Ahn, Daechul, Kim, Daneul, Song, Gwangmo, Kim, Seung Hwan, Lee, Honglak, Kang, Dongyeop, Choi, Jonghyun

arXiv.org Artificial IntelligenceAug-19-2023

Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convincing images (e.g., with a correct character or with a proper background of the scene) remains a challenge. To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. In extensive experiments on the two popular SV benchmarks, i.e., the Pororo-SV and Flintstones-SV, the proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision with similar or less computational complexity.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.07575

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Michigan (0.04)

Genre:

Instructional Material > Online (0.61)
Instructional Material > Course Syllabus & Notes (0.61)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Improved Visual Story Generation with Adaptive Context Modeling

Feng, Zhangyin, Ren, Yuchen, Yu, Xinmiao, Feng, Xiaocheng, Tang, Duyu, Shi, Shuming, Qin, Bing

arXiv.org Artificial IntelligenceMay-26-2023

Diffusion models developed on top of powerful text-to-image generation models like Stable Diffusion achieve remarkable success in visual story generation. However, the best-performing approach considers historically generated results as flattened memory cells, ignoring the fact that not all preceding images contribute equally to the generation of the characters and scenes at the current stage. To address this, we present a simple method that improves the leading system with adaptive context modeling, which is not only incorporated in the encoder but also adopted as additional guidance in the sampling stage to boost the global consistency of the generated story. We evaluate our model on PororoSV and FlintstonesSV datasets and show that our approach achieves state-of-the-art FID scores on both story visualization and continuation scenarios. We conduct detailed model analysis and show that our model excels at generating semantically consistent images for stories.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.16811

Country:

North America > Dominican Republic (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.40)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback