Goto

Collaborating Authors

 thumbnail image


Can Impressions of Music be Extracted from Thumbnail Images?

Harada, Takashi, Motomitsu, Takehiro, Hayashi, Katsuhiko, Sakai, Yusuke, Kamigaito, Hidetaka

arXiv.org Artificial Intelligence

In recent years, there has been a notable increase in research on machine learning models for music retrieval and generation systems that are capable of taking natural language sentences as inputs. However, there is a scarcity of large-scale publicly available datasets, consisting of music data and their corresponding natural language descriptions known as music captions. In particular, non-musical information such as suitable situations for listening to a track and the emotions elicited upon listening is crucial for describing music. This type of information is underrepresented in existing music caption datasets due to the challenges associated with extracting it directly from music data. To address this issue, we propose a method for generating music caption data that incorporates non-musical aspects inferred from music thumbnail images, and validated the effectiveness of our approach through human evaluations. Additionally, we created a dataset with approximately 360,000 captions containing non-musical aspects. Leveraging this dataset, we trained a music retrieval model and demonstrated its effectiveness in music retrieval tasks through evaluation.


Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability

Yoon, Yejun, Yoon, Seunghyun, Park, Kunwoo

arXiv.org Artificial Intelligence

This paper addresses the critical challenge of assessing the representativeness of news thumbnail images, which often serve as the first visual engagement for readers when an article is disseminated on social media. We focus on whether a news image represents the actors discussed in the news text. To serve the challenge, we introduce NewsTT, a manually annotated dataset of 1000 news thumbnail images and text pairs. We found that the pretrained vision and language models, such as BLIP-2, struggle with this task. Since news subjects frequently involve named entities or proper nouns, the pretrained models could have a limited capability to match news actors' visual and textual appearances. We hypothesize that learning to contrast news text with its counterfactual, of which named entities are replaced, can enhance the cross-modal matching ability of vision and language models. We propose CFT-CLIP, a contrastive learning framework that updates vision and language bi-encoders according to the hypothesis. We found that our simple method can boost the performance for assessing news thumbnail representativeness, supporting our assumption. Code and data can be accessed at https://github.com/ssu-humane/news-images-acl24.


Evons: A Dataset for Fake and Real News Virality Analysis and Prediction

Krstovski, Kriste, Ryu, Angela Soomin, Kogut, Bruce

arXiv.org Artificial Intelligence

We present a novel collection of news articles originating from fake and real news media sources for the analysis and prediction of news virality. Unlike existing fake news datasets which either contain claims or news article headline and body, in this collection each article is supported with a Facebook engagement count which we consider as an indicator of the article virality. In addition we also provide the article description and thumbnail image with which the article was shared on Facebook. These images were automatically annotated with object tags and color attributes. Using cloud based vision analysis tools, thumbnail images were also analyzed for faces and detected faces were annotated with facial attributes. We empirically investigate the use of this collection on an example task of article virality prediction.


Artificial Intelligence at Netflix - Two Current Use-Cases

#artificialintelligence

Netflix launched in 1997 as a mail-based DVD rental business. Alongside the growing US DVD market in the late 1990s and early 2000s, Netflix's business grew and the company went public in 2002. Netflix posted its first profit a year later. By 2007, Netflix introduced its streaming service, and by 2013, the company began producing original content. Today, Netflix is one of the world's largest entertainment services with over 200 million paid memberships spanning 190 countries, according to the company's 2020 Annual Report.


How to Achieve Valid Image Size for JSON-LD Markup

#artificialintelligence

To understand the overarching use of images in schema, 'logo' and'photo' are categorized as sub-properties of'image'. Correct use means gaining more control over what populates in a business's Google Knowledge Graph. Every website has a logo (or should have one), and correct logo markup lets web masters tell Google which image they want used in their Knowledge Graph display. Guidelines say, "You can specify which image Google should use as your organization's logo in search results and the Knowledge Graph. To do this, add schema.org Organization markup to your official website that identifies the location of your preferred logo."