Goto

Collaborating Authors

 character identification


Emotion-Aware Speech Generation with Character-Specific Voices for Comics

arXiv.org Artificial Intelligence

This paper presents an end-to-end pipeline for generating character-specific, emotion-aware speech from comics. The proposed system takes full comic volumes as input and produces speech aligned with each character's dialogue and emotional state. An image processing module performs character detection, text recognition, and emotion intensity recognition. A large language model performs dialogue attribution and emotion analysis by integrating visual information with the evolving plot context. Speech is synthesized through a text-to-speech model with distinct voice profiles tailored to each character and emotion. This work enables automated voiceover generation for comics, offering a step toward interactive and immersive comic reading experience.


StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

arXiv.org Artificial Intelligence

Existing large vision-language models (LVLMs) are largely limited to processing short, seconds-long videos and struggle with generating coherent descriptions for extended video spanning minutes or more. Long video description introduces new challenges, such as plot-level consistency across descriptions. To address these, we figure out audio-visual character identification, matching character names to each dialogue, as a key factor. We propose StoryTeller, a system for generating dense descriptions of long videos, incorporating both low-level visual concepts and high-level plot information. StoryTeller uses a multimodal large language model that integrates visual, audio, and text modalities to perform audio-visual character identification on minute-long video clips. The results are then fed into a LVLM to enhance consistency of video description. We validate our approach on movie description tasks and introduce MovieStory101, a dataset with dense descriptions for three-minute movie clips. To evaluate long video descriptions, we create MovieQA, a large set of multiple-choice questions for the MovieStory101 test set. We assess descriptions by inputting them into GPT-4 to answer these questions, using accuracy as an automatic evaluation metric. Experiments show that StoryTeller outperforms all open and closed-source baselines on MovieQA, achieving 9.5% higher accuracy than the strongest baseline, Gemini-1.5-pro, and demonstrating a +15.56% advantage in human side-by-side evaluations. Additionally, incorporating audio-visual character identification from StoryTeller improves the performance of all video description models, with Gemini-1.5-pro and GPT-4o showing relative improvement of 5.5% and 13.0%, respectively, in accuracy on MovieQA.


Improving Automatic Quotation Attribution in Literary Novels

arXiv.org Artificial Intelligence

Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data, which poses a challenge for in-the-wild inference. Here, we approach quotation attribution as a set of four interconnected sub-tasks: character identification, coreference resolution, quotation identification, and speaker attribution. We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels (the Project Dialogism Novel Corpus). We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.


Information retrieval in folktales using natural language processing

arXiv.org Artificial Intelligence

Recognising literary characters in various narrative texts is challenging both from the literary and technical perspective. From the literary viewpoint, the meaning of the term "character" leaves space to various interpretations. From the technical perspective, literary texts contain a lot of data about emotions, social life or inner life of the characters, while they are very thin on technical, straightforward messages. To infer the character type from literary texts might pose problems even to the human readers [4]. Interactions between literary characters contain rich social networks.


Toward Automatic Character Identification in Unannotated Narrative Text

AAAI Conferences

We present a case-based approach to character identification in natural language text in the context of our Voz system. Voz first extracts entities from the text, and for each one of them, computes a feature-vector using both linguistic information and external knowledge. We propose a new similarity measure called Continuous Jaccard that exploits those feature-vectors to compute the similarity between a given entity and those in the case-base, and thus determine which entities are characters or not. We evaluate our approach by comparing it with different similarity measures and feature sets. Results show an identification accuracy of up to 93.49%, significantly higher than recent related work.