The Myth of Culturally Agnostic AI Models
–arXiv.org Artificial Intelligence
AI models trained on enormously large datasets, specifically large language or vision-language models, represent a phenomenon that has by far outgrown the notion of "being just a tool". Regardless of how and for what purpose such models are practically employed, they in themselves represent a valuable object of study. "In themselves" in this context includes not only all the culturally dependent patterns learned from the training data, but also the culturally dependent attempts to control, modify or erase culturally dependent patterns in the training data. Focusing on the comparative analysis of outputs from two very popular text-to-image synthesis models, DALL E 2 [1] and Stable Diffusion [2], this paper tries to tackle the pros and cons of striving towards culturally agnostic vs. culturally specific AI models. Implemented in one way or another, most commonly the "guiding principle" behind many existing text-to-image generators is the ground-breaking vision-language model CLIP and its emerging derivatives (e.g.
arXiv.org Artificial Intelligence
Nov-29-2022