Toward accessible comics for blind and low vision readers

Rigaud, Christophe, Burie, Jean-Christophe, Petit, Samuel

Jul-11-2024–arXiv.org Artificial Intelligence

This work explores how to fine-tune large language models using prompt engineering techniques with contextual information for generating an accurate text description of the full story, ready to be forwarded to off-the-shelve speech synthesis tools. We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content, such as panels, characters, text, reading order and the association of bubbles and characters. Then we infer character identification and generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc. We believe that such enriched content description can be easily used to produce audiobook and eBook with various voices for characters, captions and playing sound effects. Keywords: comics understanding large language model prompt engineering character identification comic book script accessible comics.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jul-11-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.04)
- North America > United States
  - Massachusetts (0.04)
  - California > San Francisco County
    - San Francisco (0.04)
- Europe
  - France (0.04)
  - Italy > Sardinia
    - Cagliari (0.04)
  - Austria > Upper Austria
    - Linz (0.04)

Genre:
- Research Report (1.00)
- Overview (1.00)

Industry:
- Health & Medicine (0.93)
- Media > Publishing (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Statistical Learning
    - Clustering (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found