S, Harini I
LLaVA Finds Free Lunch: Teaching Human Behavior Improves Content Understanding Abilities Of LLMs
Singh, Somesh, S, Harini I, Singla, Yaman K, Baths, Veeky, Shah, Rajiv Ratn, Chen, Changyou, Krishnamurthy, Balaji
Communication is defined as "Who says what to whom with what effect." A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLMs on receiver behavior can actually help improve their content-understanding abilities. Specifically, we show that training LLMs to predict the receiver behavior of likes and comments improves the LLM's performance on a wide variety of downstream content understanding tasks. We show this performance increase over 40 video and image understanding tasks over 23 benchmark datasets across both 0-shot and fine-tuning settings, outperforming many supervised baselines. Moreover, since receiver behavior, such as likes and comments, is collected by default on the internet and does not need any human annotations to be useful, the performance improvement we get after training on this data is essentially free-lunch. We release the receiver behavior cleaned comments and likes of 750k images and videos collected from multiple platforms along with our instruction-tuning data.
Long-Term Ad Memorability: Understanding and Generating Memorable Ads
S, Harini I, Singh, Somesh, Singla, Yaman K, Bhattacharyya, Aanisha, Baths, Veeky, Chen, Changyou, Shah, Rajiv Ratn, Krishnamurthy, Balaji
Marketers spend billions of dollars on advertisements but to what end? At the time of purchase, if customers cannot recognize the brand for which they saw an ad, the money spent on the ad is essentially wasted. Despite its importance in marketing, until now, there has been no study on the memorability of ads in the ML literature. Most studies have been conducted on short-term recall (<5 mins) on specific content types like object and action videos. On the other hand, the advertising industry only cares about long-term memorability, and ads are almost always highly multimodal, depicting a story through its different modalities. With this motivation, we release the first large-scale memorability dataset, LAMDBA, consisting of 1749 participants and 2205 ads covering 276 brands. Running statistical tests over different participant subpopulations and ad types, we find many interesting insights into what makes an ad memorable. For e.g., we find that brands that use commercials with fast-moving scenes are more memorable than those with slower scenes (p=8e-10) and that people who use ad-blockers remember fewer ads than those who don't (p=5e-3). Next, to simulate the memorability of marketing materials for a particular audience, we present a novel model, Henry, trained to leverage real-world knowledge of LLMs and visual knowledge to predict the memorability. We test Henry on all the prominent memorability datasets in literature (both images and videos) and achieve state-of-the-art performance across all of them. Henry shows strong generalization showing better results in 0-shot on unseen datasets. Next, we propose the task of memorable ad generation and release a large-scale ad dataset, UltraLAMBDA, consisting of 4 million ads with their Henry-assigned memorability scores. We show that aligning Henry to generate memorable content improves memorability scores by more than 25%.