Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art

Mar-15-2025–arXiv.org Artificial Intelligence

Text-to-Image (T2I) diffusion models (DM) have garnered widespread adoption due to their capability in generating high-fidelity outputs and accessibility to anyone able to put imagination into words. However, DMs are often predisposed to generate unappealing outputs, much like the random images on the internet they were trained on. Existing approaches to address this are founded on the implicit premise that visual aesthetics is universal, which is limiting. Aesthetics in the T2I context should be about personalization and we propose the novel task of aesthetics alignment which seeks to align user-specified aesthetics with the T2I generation output. Inspired by how artworks provide an invaluable perspective to approach aesthetics, we codify visual aesthetics using the compositional framework artists employ, known as the Principles of Art (PoA). To facilitate this study, we introduce CompArt, a large-scale compositional art dataset building on top of WikiArt with PoA analysis annotated by a capable Multimodal LLM. Leveraging the expressive power of LLMs and training a lightweight and transferrable adapter, we demonstrate that T2I DMs can effectively offer 10 compositional controls through user-specified PoA conditions. Additionally, we design an appropriate evaluation framework to assess the efficacy of our approach.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Mar-15-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Utah > Salt Lake County
      - Salt Lake City (0.04)
    - New York > New York County
      - New York City (0.04)
    - Nevada > Clark County
      - Las Vegas (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California
      - Santa Clara County > Mountain View (0.04)
      - San Diego County > San Diego (0.04)
      - Los Angeles County
        Los Angeles (0.14)
        Long Beach (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.14)
- Europe
  - Spain (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Italy
    - Veneto > Venice (0.04)
    - Calabria > Catanzaro Province
      - Catanzaro (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
  - Austria
    - Vienna (0.14)
    - Styria > Graz (0.04)
- Asia
  - Singapore (0.04)
  - Japan (0.04)
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report (0.50)

Industry:
- Media (0.67)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Cognitive Science > Problem Solving (0.87)
    - Natural Language
      - Large Language Model (1.00)
      - Text Processing (0.92)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found