Goto

Collaborating Authors

 painter


Unlimited Editions: Documenting Human Style in AI Art Generation

Leitch, Alex, Chen, Celia

arXiv.org Artificial Intelligence

As AI art generation becomes increasingly sophisticated, HCI research has focused primarily on questions of detection, authenticity, and automation. This paper argues that such approaches fundamentally misunderstand how artistic value emerges from the concerns that drive human image production. Through examination of historical precedents, we demonstrate that artistic style is not only visual appearance but the resolution of creative struggle, as artists wrestle with influence and technical constraints to develop unique ways of seeing. Current AI systems flatten these human choices into reproducible patterns without preserving their provenance. We propose that HCI's role lies not only in perfecting visual output, but in developing means to document the origins and evolution of artistic style as it appears within generated visual traces. This reframing suggests new technical directions for HCI research in generative AI, focused on automatic documentation of stylistic lineage and creative choice rather than simple reproduction of aesthetic effects.


Test-Time Visual In-Context Tuning

Xie, Jiahao, Tonioni, Alessio, Rauschmayr, Nathalie, Tombari, Federico, Schiele, Bernt

arXiv.org Artificial Intelligence

Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time Visual In-Context Tuning (VICT), a method that can adapt VICL models on the fly with a single test sample. Specifically, we flip the role between the task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on six representative vision tasks ranging from high-level visual understanding to low-level image processing, with 15 common corruptions, demonstrate that our VICT can improve the generalizability of VICL to unseen new domains. In addition, we show the potential of applying VICT for unseen tasks at test time. Code: https://github.com/Jiahao000/VICT.


PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings

Van Horn, Andrew, Smith, Lauryn, Mahmoud, Mahamad, McMaster, Michael, Pinchbeck, Clara, Martin, Ina, Lininger, Andrew, Ingrisano, Anthony, Lowe, Adam, Bayod, Carlos, Bolman, Elizabeth, Singer, Kenneth, Hinczewski, Michael

arXiv.org Artificial Intelligence

The history of art has seen significant shifts in the manner in which artworks are created, making understanding of creative processes a central question in technical art history. In the Renaissance and Early Modern period, paintings were largely produced by master painters directing workshops of apprentices who often contributed to projects. The masters varied significantly in artistic and managerial styles, meaning different combinations of artists and implements might be seen both between masters and within workshops or even individual canvases. Information on how different workshops were managed and the processes by which artworks were created remains elusive. Machine learning methods have potential to unearth new information about artists' creative processes by extending the analysis of brushwork to a microscopic scale. Analysis of workshop paintings, however, presents a challenge in that documentation of the artists and materials involved is sparse, meaning external examples are not available to train networks to recognize their contributions. Here we present a novel machine learning approach we call pairwise assignment training for classifying heterogeneity (PATCH) that is capable of identifying individual artistic practice regimes with no external training data, or "ground truth." The method achieves unsupervised results by supervised means, and outperforms both simple statistical procedures and unsupervised machine learning methods. We apply this method to two historical paintings by the Spanish Renaissance master, El Greco: The Baptism of Christ and Christ on the Cross with Landscape, and our findings regarding the former potentially challenge previous work that has assigned the painting to workshop members. Further, the results of our analyses create a measure of heterogeneity of artistic practice that can be used to characterize artworks across time and space.


Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in Virtual Reality Apps

Li, Shuqing, Gao, Cuiyun, Zhang, Jianping, Zhang, Yujia, Liu, Yepang, Gu, Jiazhen, Peng, Yun, Lyu, Michael R.

arXiv.org Artificial Intelligence

The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of the user's brain, leading to user discomfort and even adverse health effects. Such issues commonly exist but remain underexplored. We conduct an empirical analysis on 282 SVI bug reports from 15 VR platforms, summarizing 15 types of manifestations. The empirical analysis reveals that automatically detecting SVI issues is challenging, mainly because: (1) lack of training data; (2) the manifestations of SVI issues are diverse, complicated, and often application-specific; (3) most accessible VR apps are closed-source commercial software. Existing pattern-based supervised classification approaches may be inapplicable or ineffective in detecting the SVI issues. To counter these challenges, we propose an unsupervised black-box testing framework named StereoID to identify the stereoscopic visual inconsistencies, based only on the rendered GUI states. StereoID generates a synthetic right-eye image based on the actual left-eye image and computes distances between the synthetic right-eye image and the actual right-eye image to detect SVI issues. We propose a depth-aware conditional stereo image translator to power the image generation process, which captures the expected perspective shifts between left-eye and right-eye images. We build a large-scale unlabeled VR stereo screenshot dataset with larger than 171K images from 288 real-world VR apps for experiments. After substantial experiments, StereoID demonstrates superior performance for detecting SVI issues in both user reports and wild VR apps.


Ted Chiang Is Wrong About AI Art

The Atlantic - Technology

Artists and writers all over the world have spent the past two years engaged in an existential battle. Generative-AI programs such as ChatGPT and DALL-E are built on work stolen from humans, and machines threaten to replace the artists and writers who made the material in the first place. Their outrage is well warranted--but their arguments don't always make sense or substantively help defend humanity. Over the weekend, the legendary science-fiction writer Ted Chiang stepped into the fray, publishing an essay in The New Yorker arguing, as the headline says, that AI "isn't going to make art." Chiang writes not simply that AI's outputs can be or are frequently lacking value but that AI cannot be used to make art, really ever, leaving no room for the many different ways someone might use the technology.


Transformer Layers as Painters

Sun, Qi, Pickett, Marc, Nain, Aakash Kumar, Jones, Llion

arXiv.org Artificial Intelligence

Despite their nearly universal adoption for large language models, the internal workings of transformers are not well understood. We aim to better understand the impact of removing or reorganizing information throughout the layers of a pretrained transformer. Such an understanding could both yield better usage of existing models as well as to make architectural improvements to produce new variants. We present a series of empirical studies on frozen models that show that the lower and final layers of pretrained transformers differ from middle layers, but that middle layers have a surprising amount of uniformity. We further show that some classes of problems have robustness to skipping layers, running the layers in an order different from how they were trained, or running the layers in parallel. Our observations suggest that even frozen pretrained models may gracefully trade accuracy for latency by skipping layers or running layers in parallel.


Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?

Yang, Zhe, Zhang, Yichang, Liu, Tianyu, Yang, Jian, Lin, Junyang, Zhou, Chang, Sui, Zhifang

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive capabilities, but still suffer from inconsistency issues (e.g. LLMs can react differently to disturbances like rephrasing or inconsequential order change). In addition to these inconsistencies, we also observe that LLMs, while capable of solving hard problems, can paradoxically fail at easier ones. To evaluate this hard-to-easy inconsistency, we develop the ConsisEval benchmark, where each entry comprises a pair of questions with a strict order of difficulty. Furthermore, we introduce the concept of consistency score to quantitatively measure this inconsistency and analyze the potential for improvement in consistency by relative consistency score. Based on comprehensive experiments across a variety of existing models, we find: (1) GPT-4 achieves the highest consistency score of 92.2\% but is still inconsistent to specific questions due to distraction by redundant information, misinterpretation of questions, etc.; (2) models with stronger capabilities typically exhibit higher consistency, but exceptions also exist; (3) hard data enhances consistency for both fine-tuning and in-context learning. Our data and code will be publicly available on GitHub.


Content-Conditioned Generation of Stylized Free hand Sketches

Liu, Jiajun, Wang, Siyuan, Zhu, Guangming, Zhang, Liang, Li, Ning, Gao, Eryang

arXiv.org Artificial Intelligence

In recent years, the recognition of free-hand sketches has remained a popular task. However, in some special fields such as the military field, free-hand sketches are difficult to sample on a large scale. Common data augmentation and image generation techniques are difficult to produce images with various free-hand sketching styles. Therefore, the recognition and segmentation tasks in related fields are limited. In this paper, we propose a novel adversarial generative network that can accurately generate realistic free-hand sketches with various styles. We explore the performance of the model, including using styles randomly sampled from a prior normal distribution to generate images with various free-hand sketching styles, disentangling the painters' styles from known free-hand sketches to generate images with specific styles, and generating images of unknown classes that are not in the training set. We further demonstrate with qualitative and quantitative evaluations our advantages in visual quality, content accuracy, and style imitation on SketchIME.


Finding Concept Representations in Neural Networks with Self-Organizing Maps

d'Aquin, Mathieu

arXiv.org Artificial Intelligence

In sufficiently complex tasks, it is expected that as a side effect of learning to solve a problem, a neural network will learn relevant abstractions of the representation of that problem. This has been confirmed in particular in machine vision where a number of works showed that correlations could be found between the activations of specific units (neurons) in a neural network and the visual concepts (textures, colors, objects) present in the image. Here, we explore the use of self-organizing maps as a way to both visually and computationally inspect how activation vectors of whole layers of neural networks correspond to neural representations of abstract concepts such as `female person' or `realist painter'. We experiment with multiple measures applied to those maps to assess the level of representation of a concept in a network's layer. We show that, among the measures tested, the relative entropy of the activation map for a concept compared to the map for the whole data is a suitable candidate and can be used as part of a methodology to identify and locate the neural representation of a concept, visualize it, and understand its importance in solving the prediction task at hand.


Assessing the influence of attractor-verb distance on grammatical agreement in humans and language models

Zacharopoulos, Christos-Nikolaos, Desbordes, Théo, Sablé-Meyer, Mathias

arXiv.org Artificial Intelligence

Subject-verb agreement in the presence of an attractor noun located between the main noun and the verb elicits complex behavior: judgments of grammaticality are modulated by the grammatical features of the attractor. For example, in the sentence "The girl near the boys likes climbing", the attractor (boys) disagrees in grammatical number with the verb (likes), creating a locally implausible transition probability. Here, we parametrically modulate the distance between the attractor and the verb while keeping the length of the sentence equal. We evaluate the performance of both humans and two artificial neural network models: both make more mistakes when the attractor is closer to the verb, but neural networks get close to the chance level while humans are mostly able to overcome the attractor interference. Additionally, we report a linear effect of attractor distance on reaction times. We hypothesize that a possible reason for the proximity effect is the calculation of transition probabilities between adjacent words. Nevertheless, classical models of attraction such as the cue-based model might suffice to explain this phenomenon, thus paving the way for new research. Data and analyses available at https://osf.io/d4g6k