Khattat: Enhancing Readability and Concept Representation of Semantic Typography

Hussein, Ahmed, Elsetohy, Alaa, Hadhoud, Sama, Bakr, Tameem, Rohaim, Yasser, AlKhamissi, Badr

Oct-1-2024–arXiv.org Artificial Intelligence

Designing expressive typography that visually conveys a word's meaning while maintaining readability is a complex task, known as semantic typography. It involves selecting an idea, choosing an appropriate font, and balancing creativity with legibility. We introduce an end-toend system that automates this process. First, a Large Language Model (LLM) generates imagery ideas for the word, useful for abstract concepts like "freedom." Then, the FontCLIP pre-trained model automatically selects a suitable font based on its semantic understanding of font attributes. The system identifies optimal regions of the word for morphing and iteratively transforms them using a pre-trained diffusion model. A key feature is our OCR-based loss function, which enhances readability and enables simultaneous stylization of multiple characters. We compare our method with other baselines, demonstrating great readability enhancement and versatility across multiple languages and writing scripts.

diffusion model, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Oct-1-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County > New York City (0.14)
- Europe > Switzerland
  - Vaud > Lausanne (0.04)
- Asia
  - Singapore (0.04)
  - Middle East > UAE (0.04)
  - Japan (0.04)
  - Indonesia > Bali (0.04)
- Africa > Middle East
  - Egypt (0.05)

Genre:
- Research Report (0.85)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)