social media image
On Large Uni- and Multi-modal Models for Unsupervised Classification of Social Media Images: Nature's Contribution to People as a case study
Khaldi, Rohaifa, Alcaraz-Segura, Domingo, Sánchez-Herrera, Ignacio, Martinez-Lopez, Javier, Navarro, Carlos Javier, Tabik, Siham
Social media images have proven to be a valuable source of information for understanding human interactions with important subjects such as cultural heritage, biodiversity, and nature, among others. The task of grouping such images into a number of semantically meaningful clusters without labels is challenging due to the high diversity and complex nature of the visual content in addition to their large volume. On the other hand, recent advances in Large Visual Models (LVMs), Large Language Models (LLMs), and Large Visual Language Models (LVLMs) provide an important opportunity to explore new productive and scalable solutions. This work proposes, analyzes, and compares various approaches based on one or more state-of-the-art LVM, LLM, and LVLM, for mapping social media images into a number of predefined classes. As a case study, we consider the problem of understanding the interactions between humans and nature, also known as Nature's Contribution to People or Cultural Ecosystem Services (CES). Our experiments show that the highest-performing approaches, with accuracy above 95%, still require the creation of a small labeled dataset. These include the fine-tuned LVM DINOv2 and the LVLM LLaVA-1.5 combined with a fine-tuned LLM. The top fully unsupervised approaches, achieving accuracy above 84%, are the LVLMs, specifically the proprietary GPT-4 model and the public LLaVA-1.5 model. Additionally, the LVM DINOv2, when applied in a 10-shot learning setup, delivered competitive results with an accuracy of 83.99%, closely matching the performance of the LVLM LLaVA-1.5.
Can AI Outperform Human Experts in Creating Social Media Creatives?
Park, Eunkyung, Wong, Raymond K., Kwon, Junbum
Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most popular Instagram posts (with the biggest number of like clicks) in top brands' Instagram accounts to create social media creatives. We give GPT 4 several prompt instructions with text descriptions to generate the most effective prompts for cutting-edge text-to-image generators: Midjourney, DALL E 3, and Stable Diffusion. LLM-augmented prompts can boost AI's abilities by adding objectives, engagement strategy, lighting and brand consistency for social media image creation. We conduct an extensive human evaluation experiment, and find that AI excels human experts, and Midjourney is better than the other text-to-image generators. Surprisingly, unlike conventional wisdom in the social media industry, prompt instruction including eye-catching shows much poorer performance than those including natural. Regarding the type of creatives, AI improves creatives with animals or products but less with real people. Also, AI improves creatives with short text descriptions more than with long text descriptions, because there is more room for AI to augment prompts with shorter descriptions.
A Picture May Be Worth a Thousand Lives: An Interpretable Artificial Intelligence Strategy for Predictions of Suicide Risk from Social Media Images
Badian, Yael, Ophir, Yaakov, Tikochinski, Refael, Calderon, Nitay, Klomek, Anat Brunstein, Reichart, Roi
The promising research on Artificial Intelligence usages in suicide prevention has principal gaps, including black box methodologies, inadequate outcome measures, and scarce research on non-verbal inputs, such as social media images (despite their popularity today, in our digital era). This study addresses these gaps and combines theory-driven and bottom-up strategies to construct a hybrid and interpretable prediction model of valid suicide risk from images. The lead hypothesis was that images contain valuable information about emotions and interpersonal relationships, two central concepts in suicide-related treatments and theories. The dataset included 177,220 images by 841 Facebook users who completed a gold-standard suicide scale. The images were represented with CLIP, a state-of-the-art algorithm, which was utilized, unconventionally, to extract predefined features that served as inputs to a simple logistic-regression prediction model (in contrast to complex neural networks). The features addressed basic and theory-driven visual elements using everyday language (e.g., bright photo, photo of sad people). The results of the hybrid model (that integrated theory-driven and bottom-up methods) indicated high prediction performance that surpassed common bottom-up algorithms, thus providing a first proof that images (alone) can be leveraged to predict validated suicide risk. Corresponding with the lead hypothesis, at-risk users had images with increased negative emotions and decreased belonginess. The results are discussed in the context of non-verbal warning signs of suicide. Notably, the study illustrates the advantages of hybrid models in such complicated tasks and provides simple and flexible prediction strategies that could be utilized to develop real-life monitoring tools of suicide.
Wang
Recently text-based sentiment prediction has been extensively studied, while image-centric sentiment analysis receives much less attention. In this paper,we study the problem of understanding human sentiments from large-scale social media images,considering both visual content and contextual information,such as comments on the images, captions,etc. The challenge of this problem lies in the "semantic gap" between low-level visual features and higher-level image sentiments. Moreover, the lack of proper annotations/labels in the majority of social media images presents another challenge.To address these two challenges, we propose a novel Unsupervised SEntiment Analysis (USEA) framework for social media images. Our approach exploits relations among visual content and relevant contextual information to bridge the "semantic gap" in the prediction of image sentiments. With experiments on two large-scale datasets, we show that the proposed method is effective in addressing the two challenges.
Unsupervised Sentiment Analysis for Social Media Images
Wang, Yilin (Arizona State University) | Wang, Suhang (Arizona State University) | Tang, Jiliang (Arizona State University) | Liu, Huan (Arizona State University) | Li, Baoxin (Arizona State University)
Current methods of sentiment analysis for social media images include low-level visual feature based approaches [Jia et Recently text-based sentiment prediction has been al., 2012; Yang et al., 2014], mid-level visual feature based extensively studied, while image-centric sentiment approaches [Borth et al., 2013; Yuan et al., 2013] and deep analysis receives much less attention. In this paper, learning based approaches [You et al., 2015]. The vast majority we study the problem of understanding human of existing methods are supervised, relying on labeled images sentiments from large-scale social media images, to train sentiment classifiers. Unfortunately, sentiment considering both visual content and contextual information, labels are in general unavailable for social media images, and such as comments on the images, captions, it is too labor-and time-intensive to obtain labeled sets large etc. The challenge of this problem lies in enough for robust training. In order to utilize the vast amount the "semantic gap" between low-level visual features of unlabeled social media images, an unsupervised approach and higher-level image sentiments. Moreover, would be much more desirable.