Goto

Collaborating Authors

 memorability score


LLM-based Fusion of Multi-modal Features for Commercial Memorability Prediction

Pramov, Aleksandar

arXiv.org Artificial Intelligence

This paper addresses the prediction of commercial (brand) memorability as part of "Subtask 2: Commercial/Ad Memorability" within the "Memorability: Predicting movie and commercial memorability" task at the MediaEval 2025 workshop competition. We propose a multimodal fusion system with a Gemma-3 LLM backbone that integrates pre-computed visual (ViT) and textual (E5) features by multi-modal projections. The model is adapted using Low-Rank Adaptation (LoRA). A heavily-tuned ensemble of gradient boosted trees serves as a baseline. A key contribution is the use of LLM-generated rationale prompts, grounded in expert-derived aspects of memorability, to guide the fusion model. The results demonstrate that the LLM-based system exhibits greater robustness and generalization performance on the final test set, compared to the baseline. The paper's codebase can be found at https://github.com/dsgt-arc/mediaeval-2025-memorability


MindMem: Multimodal for Predicting Advertisement Memorability Using LLMs and Deep Learning

Asgarian, Sepehr, Jetha, Qayam, Jeon, Jouhyun

arXiv.org Artificial Intelligence

In the competitive landscape of advertising, success hinges on effectively navigating and leveraging complex interactions among consumers, advertisers, and advertisement platforms. These multifaceted interactions compel advertisers to optimize strategies for modeling consumer behavior, enhancing brand recall, and tailoring advertisement content. To address these challenges, we present MindMem, a multimodal predictive model for advertisement memorability. By integrating textual, visual, and auditory data, MindMem achieves state-of-the-art performance, with a Spearman's correlation coefficient of 0.631 on the LAMBDA and 0.731 on the Memento10K dataset, consistently surpassing existing methods. Furthermore, our analysis identified key factors influencing advertisement memorability, such as video pacing, scene complexity, and emotional resonance. Expanding on this, we introduced MindMem-ReAd (MindMem-Driven Re-generated Advertisement), which employs Large Language Model-based simulations to optimize advertisement content and placement, resulting in up to a 74.12% improvement in advertisement memorability. Our results highlight the transformative potential of Artificial Intelligence in advertising, offering advertisers a robust tool to drive engagement, enhance competitiveness, and maximize impact in a rapidly evolving market.


A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

Tseng, Li-Yang, Lin, Tzu-Ling, Shuai, Hong-Han, Huang, Jen-Wei, Chang, Wen-Whei

arXiv.org Artificial Intelligence

Nowadays, humans are constantly exposed to music, whether through voluntary streaming services or incidental encounters during commercial breaks. Despite the abundance of music, certain pieces remain more memorable and often gain greater popularity. Inspired by this phenomenon, we focus on measuring and predicting music memorability. To achieve this, we collect a new music piece dataset with reliable memorability labels using a novel interactive experimental procedure. We then train baselines to predict and analyze music memorability, leveraging both interpretable features and audio mel-spectrograms as inputs. To the best of our knowledge, we are the first to explore music memorability using data-driven deep learning-based methods. Through a series of experiments and ablation studies, we demonstrate that while there is room for improvement, predicting music memorability with limited data is possible. Certain intrinsic elements, such as higher valence, arousal, and faster tempo, contribute to memorable music. As prediction techniques continue to evolve, real-life applications like music recommendation systems and music style transfer will undoubtedly benefit from this new area of research.


Memorability of Image Regions

Neural Information Processing Systems

While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works.


Long-Term Ad Memorability: Understanding and Generating Memorable Ads

S, Harini I, Singh, Somesh, Singla, Yaman K, Bhattacharyya, Aanisha, Baths, Veeky, Chen, Changyou, Shah, Rajiv Ratn, Krishnamurthy, Balaji

arXiv.org Artificial Intelligence

Marketers spend billions of dollars on advertisements but to what end? At the time of purchase, if customers cannot recognize the brand for which they saw an ad, the money spent on the ad is essentially wasted. Despite its importance in marketing, until now, there has been no study on the memorability of ads in the ML literature. Most studies have been conducted on short-term recall (<5 mins) on specific content types like object and action videos. On the other hand, the advertising industry only cares about long-term memorability, and ads are almost always highly multimodal, depicting a story through its different modalities. With this motivation, we release the first large-scale memorability dataset, LAMDBA, consisting of 1749 participants and 2205 ads covering 276 brands. Running statistical tests over different participant subpopulations and ad types, we find many interesting insights into what makes an ad memorable. For e.g., we find that brands that use commercials with fast-moving scenes are more memorable than those with slower scenes (p=8e-10) and that people who use ad-blockers remember fewer ads than those who don't (p=5e-3). Next, to simulate the memorability of marketing materials for a particular audience, we present a novel model, Henry, trained to leverage real-world knowledge of LLMs and visual knowledge to predict the memorability. We test Henry on all the prominent memorability datasets in literature (both images and videos) and achieve state-of-the-art performance across all of them. Henry shows strong generalization showing better results in 0-shot on unseen datasets. Next, we propose the task of memorable ad generation and release a large-scale ad dataset, UltraLAMBDA, consisting of 4 million ads with their Henry-assigned memorability scores. We show that aligning Henry to generate memorable content improves memorability scores by more than 25%.


From seeing to remembering: Images with harder-to-reconstruct representations leave stronger memory traces

Lin, Qi, Li, Zifan, Lafferty, John, Yildirim, Ilker

arXiv.org Artificial Intelligence

These authors jointly supervised this work. Correspondence should be addressed to these authors. Abstract Much of what we remember is not due to intentional selection, but simply a by-product of perceiving. This raises a foundational question about the architecture of the mind: How does perception interface with and influence memory? Here, inspired by a classic proposal relating perceptual processing to memory durability, the level-of-processing theory, we present a sparse coding model for compressing feature embeddings of images, and show that the reconstruction residuals from this model predict how well images are encoded into memory. In an open memorability dataset of scene images, we show that reconstruction error not only explains memory accuracy but also response latencies during retrieval, subsuming, in the latter case, all of the variance explained by powerful vision-only models. We also confirm a prediction of this account with'model-driven psychophysics'. This work establishes reconstruction error as a novel signal interfacing perception and memory, possibly through adaptive modulation of perceptual processing. Introduction So much of what we remember is not the result of intentional selection, but rather the result of simply perceiving. How are perceptual experiences cast into memory? And how does perceiving exert control over remembering?


Comprehensive Literature Survey on Deep Learning used in Image Memorability Prediction and Modification

Sadana, Ananya, Thakur, Nikita, Poria, Nikita, Anand, Astika, R, Seeja K.

arXiv.org Artificial Intelligence

Every day we are exposed to many images, only a few of which are remembered, while most of them we tend to forget. Though the human cognitive system has an enormous storage capacity [1,2], it may only be able to store some images as detailed as they are. Few images are remembered in great detail, even fewer in minor details, and the remainder is quickly forgotten [3]. Natural scenery photos, for example, are less likely to be remembered than images of animals, vehicles, and people [4]. According to previous research, images are consistently memorable to different viewers [5] and some images have better memorability than others. They also showed that memorability is an intrinsic and measurable property of an image. When we discuss memorability as a measurable property, the question of an artificial system successfully predicting the image memorability score comes along. Previous works done in the domain of image memorability can be grouped into three categories - understanding features that affect image memorability, Prediction of images' memorability scores, and modifying images' memorability.


Overview of The MediaEval 2022 Predicting Video Memorability Task

Sweeney, Lorin, Constantin, Mihai Gabriel, Demarty, Claire-Hélène, Fosco, Camilo, de Herrera, Alba G. Seco, Halder, Sebastian, Healy, Graham, Ionescu, Bogdan, Matran-Fernandez, Ana, Smeaton, Alan F., Sultana, Mushfika

arXiv.org Artificial Intelligence

This paper describes the 5th edition of the Predicting Video Memorability Task as part of MediaEval2022. This year we have reorganised and simplified the task in order to lubricate a greater depth of inquiry. Similar to last year, two datasets are provided in order to facilitate generalisation, however, this year we have replaced the TRECVid2019 Video-to-Text dataset with the VideoMem dataset in order to remedy underlying data quality issues, and to prioritise short-term memorability prediction by elevating the Memento10k dataset as the primary dataset. Additionally, a fully fledged electroencephalography (EEG)-based prediction sub-task is introduced. In this paper, we outline the core facets of the task and its constituent sub-tasks; describing the datasets, evaluation metrics, and requirements for participant submissions.


Experiences from the MediaEval Predicting Media Memorability Task

de Herrera, Alba García Deco, Constantin, Mihai Gabriel, Demarty, Chaire-Hélène, Fosco, Camilo, Halder, Sebastian, Healy, Graham, Ionescu, Bogdan, Matran-Fernandez, Ana, Smeaton, Alan F., Sultana, Mushfika, Sweeney, Lorin

arXiv.org Artificial Intelligence

The Predicting Media Memorability task in the MediaEval evaluation campaign has been running annually since 2018 and several different tasks and data sets have been used in this time. This has allowed us to compare the performance of many memorability prediction techniques on the same data and in a reproducible way and to refine and improve on those techniques. The resources created to compute media memorability are now being used by researchers well beyond the actual evaluation campaign. In this paper we present a summary of the task, including the collective lessons we have learned for the research community.


Analysing the Memorability of a Procedural Crime-Drama TV Series, CSI

Cummins, Sean, Sweeney, Lorin, Smeaton, Alan F.

arXiv.org Artificial Intelligence

We investigate the memorability of a 5-season span of a popular crime-drama TV series, CSI, through the application of a vision transformer fine-tuned on the task of predicting video memorability. By investigating the popular genre of crime-drama TV through the use of a detailed annotated corpus combined with video memorability scores, we show how to extrapolate meaning from the memorability scores generated on video shots. We perform a quantitative analysis to relate video shot memorability to a variety of aspects of the show. The insights we present in this paper illustrate the importance of video memorability in applications which use multimedia in areas like education, marketing, indexing, as well as in the case here namely TV and film production.