AITopics | Media

Collaborating Authors

Media

The Newspaper That Hired ChatGPT

The Atlantic - TechnologyJun-13-2025, 10:42:00 GMT

For more than 20 years, print media has been a bit of a punching bag for digital-technology companies. Craigslist killed the paid classifieds, free websites led people to think newspapers and magazines were committing robbery when they charged for subscriptions, and the smartphone and social media turned reading full-length articles into a chore. Now generative AI is in the mix--and many publishers, desperate to avoid being left behind once more, are rushing to harness the technology themselves. Several major publications, including The Atlantic, have entered into corporate partnerships with OpenAI and other AI firms. Any number of experiments have ensued--publishers have used the software to help translate work into different languages, draft headlines, and write summaries or even articles.

large language model, machine learning, natural language, (20 more...)

The Atlantic - Technology

Genre: Personal (0.48)

Industry: Media > News (0.99)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)

Add feedback

A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations

Lan, Tian, Zhou, Yang-Hao, Ma, Zi-Ao, Sun, Fanshu, Sun, Rui-Qing, Luo, Junyu, Tu, Rong-Cheng, Huang, Heyan, Xu, Chen, Wu, Zhijing, Mao, Xian-Ling

arXiv.org Artificial IntelligenceJun-13-2025

Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous automatic evaluation methods exist, current research lacks a systematic framework that comprehensively organizes these methods across text, visual, and audio modalities. To address this issue, we present a comprehensive review and a unified taxonomy of automatic evaluation methods for generated content across all three modalities; We identify five fundamental paradigms that characterize existing evaluation approaches across these domains. Our analysis begins by examining evaluation methods for text generation, where techniques are most mature. We then extend this framework to image and audio generation, demonstrating its broad applicability. Finally, we discuss promising directions for future research in cross-modal evaluation methodologies.

generation negative sampling random sampling, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.10019

Country:

Europe (1.00)
Asia > China (0.68)
Asia > Middle East (0.67)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.67)
Information Technology > Security & Privacy (0.67)
Education > Assessment & Standards > Student Performance (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

GottBERT: a pure German Language Model

Scheible, Raphael, Thomczyk, Fabian, Tippmann, Patric, Jaravine, Victor, Boeker, Martin

arXiv.org Artificial IntelligenceJun-13-2025

Lately, pre-trained language models advanced the field of natural language processing (NLP). The introduction of Bidirectional Encoders for Transformers (BERT) and its optimized version RoBERTa have had significant impact and increased the relevance of pre-trained models. First, research in this field mainly started on English data followed by models trained with multilingual text corpora. However, current research shows that multilingual models are inferior to monolingual models. Currently, no German single language RoBERTa model is yet published, which we introduce in this work (GottBERT). The German portion of the OSCAR data set was used as text corpus. In an evaluation we compare its performance on the two Named Entity Recognition (NER) tasks Conll 2003 and GermEval 2014 as well as on the text classification tasks GermEval 2018 (fine and coarse) and GNAD with existing German single language BERT models and two multilingual ones. GottBERT was pre-trained related to the original RoBERTa model using fairseq. All downstream tasks were trained using hyperparameter presets taken from the benchmark of German BERT. The experiments were setup utilizing FARM. Performance was measured by the $F_{1}$ score. GottBERT was successfully pre-trained on a 256 core TPU pod using the RoBERTa BASE architecture. Even without extensive hyper-parameter optimization, in all NER and one text classification task, GottBERT already outperformed all other tested German and multilingual models. In order to support the German NLP field, we publish GottBERT under the AGPLv3 license.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.emnlp-main.1183

2012.0211

Country:

Europe (1.00)
Asia > Japan (0.47)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Media (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Graph-MLLM: Harnessing Multimodal Large Language Models for Multimodal Graph Learning

Liu, Jiajin, Fan, Dongzhe, Shen, Jiacheng, Ji, Chuanhao, Zha, Daochen, Tan, Qiaoyu

arXiv.org Artificial IntelligenceJun-13-2025

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in representing and understanding diverse modalities. However, they typically focus on modality alignment in a pairwise manner while overlooking structural relationships across data points. Integrating multimodality with structured graph information (i.e., multimodal graphs, MMGs) is essential for real-world applications such as social networks, healthcare, and recommendation systems. Existing MMG learning methods fall into three paradigms based on how they leverage MLLMs: Encoder, Aligner, and Predictor. MLLM-as-Encoder focuses on enhancing graph neural networks (GNNs) via multimodal feature fusion; MLLM-as-Aligner aligns multimodal attributes in language or hidden space to enable LLM-based graph reasoning; MLLM-as-Predictor treats MLLMs as standalone reasoners with in-context learning or fine-tuning. Despite their advances, the MMG field lacks a unified benchmark to fairly evaluate across these approaches, making it unclear what progress has been made. To bridge this gap, we present Graph-MLLM, a comprehensive benchmark for multimodal graph learning by systematically evaluating these three paradigms across six datasets with different domains. Through extensive experiments, we observe that jointly considering the visual and textual attributes of the nodes benefits graph learning, even when using pre-trained text-to-image alignment models (e.g., CLIP) as encoders. We also find that converting visual attributes into textual descriptions further improves performance compared to directly using visual inputs. Moreover, we observe that fine-tuning MLLMs on specific MMGs can achieve state-of-the-art results in most scenarios, even without explicit graph structure information. We hope that our open-sourced library will facilitate rapid, equitable evaluation and inspire further innovative research in this field.

information, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.10282

Country: North America > United States (0.46)

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Industry:

Media (0.70)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Classifying Unreliable Narrators with Large Language Models

Brei, Anneliese, Henry, Katharine, Sharma, Abhisheik, Srivastava, Shashank, Chaturvedi, Snigdha

arXiv.org Artificial IntelligenceJun-13-2025

Often when we interact with a first-person account of events, we consider whether or not the narrator, the primary speaker of the text, is reliable. In this paper, we propose using computational methods to identify unreliable narrators, i.e. those who unintentionally misrepresent information. Borrowing literary theory from narratology to define different types of unreliable narrators based on a variety of textual phenomena, we present TUNa, a human-annotated dataset of narratives from multiple domains, including blog posts, subreddit posts, hotel reviews, and works of literature. We define classification tasks for intra-narrational, inter-narrational, and inter-textual unreliabilities and analyze the performance of popular open-weight and proprietary LLMs for each. We propose learning from literature to perform unreliable narrator classification on real-world text data. To this end, we experiment with few-shot, fine-tuning, and curriculum learning settings. Our results show that this task is very challenging, and there is potential for using LLMs to identify unreliable narrators. We release our expert-annotated dataset and code and invite future research in this area.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.10231

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Information Technology (0.93)
Media > News (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Fine-Grained control over Music Generation with Activation Steering

Panda, Dipanshu, Joe, Jayden Koshy, R, Harshith M, Narashiman, Swathi, Mathur, Pranay, Veerakumar, Anish, Krishna, Aniruddh, A, Keerthiharan

arXiv.org Artificial IntelligenceJun-13-2025

--We present a method for fine-grained control over music generation through inference-time interventions on an autoregressive generative music transformer called MusicGen. Our approach enables timbre transfer, style transfer, and genre fusion by steering the residual stream using weights of linear probes trained on it, or by steering the attention layer activations in a similar manner . We observe that modelling this as a regression task provides improved performance, hypothesizing that the mean-squared-error better preserve meaningful directional information in the activation space. Combined with the global conditioning offered by text prompts in MusicGen, our method provides both global and local control over music generation. Audio samples illustrating our method are available at our demo page.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.10225

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Cybernetic Marionette: Channeling Collective Agency Through a Wearable Robot in a Live Dancer-Robot Duet

Sathya, Anup, Li, Jiasheng, Yan, Zeyu, Fang, Adriane, Kules, Bill, Martin, Jonathan David, Peng, Huaishu

arXiv.org Artificial IntelligenceJun-13-2025

We describe DANCE^2, an interactive dance performance in which audience members channel their collective agency into a dancer-robot duet by voting on the behavior of a wearable robot affixed to the dancer's body. At key moments during the performance, the audience is invited to either continue the choreography or override it, shaping the unfolding interaction through real-time collective input. While post-performance surveys revealed that participants felt their choices meaningfully influenced the performance, voting data across four public performances exhibited strikingly consistent patterns. This tension between what audience members do, what they feel, and what actually changes highlights a complex interplay between agentive behavior, the experience of agency, and power. We reflect on how choreography, interaction design, and the structure of the performance mediate this relationship, offering a live analogy for algorithmically curated digital systems where agency is felt, but not exercised.

artificial intelligence, audience, human computer interaction, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3715336.3735828

2506.10079

Country:

Europe (0.67)
North America > United States > California (0.46)
Asia > Japan > Honshū > Kantō (0.28)
North America > United States > Maryland > Prince George's County > College Park (0.15)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Media (0.68)
Information Technology (0.46)
(2 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

VINCIE: Unlocking In-context Image Editing from Video

Qu, Leigang, Cheng, Feng, Yang, Ziyan, Zhao, Qi, Lin, Shanchuan, Shi, Yichun, Li, Yicong, Wang, Wenjie, Chua, Tat-Seng, Jiang, Lu

arXiv.org Artificial IntelligenceJun-13-2025

In-context image editing aims to modify images based on a contextual sequence comprising text and previously generated images. Existing methods typically depend on task-specific pipelines and expert models (e.g., segmentation and inpainting) to curate training data. In this work, we explore whether an in-context image editing model can be learned directly from videos. We introduce a scalable approach to annotate videos as interleaved multimodal sequences. To effectively learn from this data, we design a block-causal diffusion transformer trained on three proxy tasks: next-image prediction, current segmentation prediction, and next-segmentation prediction. Additionally, we propose a novel multi-turn image editing benchmark to advance research in this area. Extensive experiments demonstrate that our model exhibits strong in-context image editing capabilities and achieves state-of-the-art results on two multi-turn image editing benchmarks. Despite being trained exclusively on videos, our model also shows promising abilities in multi-concept composition, story generation, and chain-of-editing applications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.10941

Genre: Research Report > New Finding (0.93)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

The Role of Generative AI in Facilitating Social Interactions: A Scoping Review

Arets, T. T. J. E., Perugia, G., Houben, M., IJsselsteijn, W. A.

arXiv.org Artificial IntelligenceJun-13-2025

Reduced social connectedness increasingly poses a threat to mental health, life expectancy, and general well-being. Generative AI (GAI) technologies, such as large language models (LLMs) and image generation tools, are increasingly integrated into applications aimed at enhancing human social experiences. Despite their growing presence, little is known about how these technologies influence social interactions. This scoping review investigates how GAI-based applications are currently designed to facilitate social interaction, what forms of social engagement they target, and which design and evaluation methodologies designers use to create and evaluate them. Through an analysis of 30 studies published since 2020, we identify key trends in application domains including storytelling, socio-emotional skills training, reminiscence, collaborative learning, music making, and general conversation. We highlight the role of participatory and co-design approaches in fostering both effective technology use and social engagement, while also examining socio-ethical concerns such as cultural bias and accessibility. This review underscores the potential of GAI to support dynamic and personalized interactions, but calls for greater attention to equitable design practices and inclusive evaluation strategies.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.10927

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Contrastive Matrix Completion with Denoising and Augmented Graph Views for Robust Recommendation

Nemati, Narges, Chehreghani, Mostafa Haghir

arXiv.org Artificial IntelligenceJun-13-2025

Matrix completion is a widely adopted framework in recommender systems, as predicting the missing entries in the user-item rating matrix enables a comprehensive understanding of user preferences. However, current graph neural network (GNN)-based approaches are highly sensitive to noisy or irrelevant edges--due to their inherent message-passing mechanisms--and are prone to overfitting, which limits their generalizability. To overcome these challenges, we propose a novel method called Matrix Completion using Contrastive Learning (MCCL). Our approach begins by extracting local neighborhood subgraphs for each interaction and subsequently generates two distinct graph representations. The first representation emphasizes denoising by integrating GNN layers with an attention mechanism, while the second is obtained via a graph variational autoencoder that aligns the feature distribution with a standard prior. A mutual learning loss function is employed during training to gradually harmonize these representations, enabling the model to capture common patterns and significantly enhance its generalizability. Extensive experiments on several real-world datasets demonstrate that our approach not only improves the numerical accuracy of the predicted scores--achieving up to a 0.8% improvement in RMSE--but also produces superior rankings with improvements of up to 36% in ranking metrics.

artificial intelligence, loss function, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.10658

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.67)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback