Media
User Negotiations of Authenticity, Ownership, and Governance on AI-Generated Video Platforms: Evidence from Sora
Shen, Bohui, Bhatta, Shrikar, Ireebanije, Alex, Liu, Zexuan, Choudhry, Abhinav, Gumusel, Ece, Zhou, Kyrie Zhixuan
As AI-generated video platforms rapidly advance, ethical challenges such as copyright infringement emerge. This study examines how users make sense of AI-generated videos on OpenAI's Sora by conducting a qualitative content analysis of user comments. Through a thematic analysis, we identified four dynamics that characterize how users negotiate authenticity, authorship, and platform governance on Sora. First, users acted as critical evaluators of realism, assessing micro-details such as lighting, shadows, fluid motion, and physics to judge whether AI-generated scenes could plausibly exist. Second, users increasingly shifted from passive viewers to active creators, expressing curiosity about prompts, techniques, and creative processes. Text prompts were perceived as intellectual property, generating concerns about plagiarism and remixing norms. Third, users reported blurred boundaries between real and synthetic media, worried about misinformation, and even questioned the authenticity of other commenters, suspecting bot-generated engagement. Fourth, users contested platform governance: some perceived moderation as inconsistent or opaque, while others shared tactics for evading prompt censorship through misspellings, alternative phrasing, emojis, or other languages. Despite this, many users also enforced ethical norms by discouraging the misuse of real people's images or disrespectful content. Together, these patterns highlighted how AI-mediated platforms complicate notions of reality, creativity, and rule-making in emerging digital ecosystems. Based on the findings, we discuss governance challenges in Sora and how user negotiations inform future platform governance.
Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
Choudhary, Yash, Rao, Preeti, Bhattacharyya, Pushpak
Accurately predicting music popularity is a critical challenge in the music industry, offering benefits to artists, producers, and streaming platforms. Prior research has largely focused on audio features, social metadata, or model architectures. This work addresses the under-explored role of lyrics in predicting popularity. We present an automated pipeline that uses LLM to extract high-dimensional lyric embeddings, capturing semantic, syntactic, and sequential information. These features are integrated into HitMusicLyricNet, a multimodal architecture that combines audio, lyrics, and social metadata for popularity score prediction in the range 0-100. Our method outperforms existing baselines on the SpotGenTrack dataset, which contains over 100,000 tracks, achieving 9% and 20% improvements in MAE and MSE, respectively. Ablation confirms that gains arise from our LLM-driven lyrics feature pipeline (LyricsAENet), underscoring the value of dense lyric representations.
ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering
Kwon, Daeyong, Doh, SeungHeon, Nam, Juhan
Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowledge in pretraining data. While music information retrieval and computational musicology have explored structured and multimodal understanding, few resources support factual and contextual music question answering (MQA) grounded in artist metadata or historical context. We introduce MusWikiDB, a vector database of 3.2M passages from 144K music-related Wikipedia pages, and ArtistMus, a benchmark of 1,000 questions on 500 diverse artists with metadata such as genre, debut year, and topic. These resources enable systematic evaluation of retrieval-augmented generation (RAG) for MQA. Experiments show that RAG markedly improves factual accuracy; open-source models gain up to +56.8 percentage points (for example, Qwen3 8B improves from 35.0 to 91.8), approaching proprietary model performance. RAG-style fine-tuning further boosts both factual recall and contextual reasoning, improving results on both in-domain and out-of-domain benchmarks. MusWikiDB also yields approximately 6 percentage points higher accuracy and 40% faster retrieval than a general-purpose Wikipedia corpus. We release MusWikiDB and ArtistMus to advance research in music information retrieval and domain-specific question answering, establishing a foundation for retrieval-augmented reasoning in culturally rich domains such as music.
Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats
Shahriar, Sadat, Ayoobi, Navid, Mukherjee, Arjun, Musharrat, Mostafa, Vamsi, Sai Vishnu
The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimic legitimate local reporting. Detecting these deceptive articles requires a fine-grained analysis of their linguistic, stylistic, and lexical characteristics. In this work, we conduct a comprehensive study to uncover the distinguishing patterns of Pink Slime content and propose detection strategies based on these insights. Beyond traditional generation methods, we highlight a new adversarial vector: modifications through large language models (LLMs). Our findings reveal that even consumer-accessible LLMs can significantly undermine existing detection systems, reducing their performance by up to 40% in F1-score. To counter this threat, we introduce a robust learning framework specifically designed to resist LLM-based adversarial attacks and adapt to the evolving landscape of automated pink slime journalism, and showed and improvement by up to 27%.
Two-Stage Camera Calibration Method for Multi-Camera Systems Using Scene Geometry
Calibration of multi-camera systems is a key task for accurate object tracking. However, it remains a challenging problem in real-world conditions, where traditional methods are not applicable due to the lack of accurate floor plans, physical access to place calibration patterns, or synchronized video streams. This paper presents a novel two-stage calibration method that overcomes these limitations. In the first stage, partial calibration of individual cameras is performed based on an operator's annotation of natural geometric primitives (parallel, perpendicular, and vertical lines, or line segments of equal length). This allows estimating key parameters (roll, pitch, focal length) and projecting the camera's Effective Field of View (EFOV) onto the horizontal plane in a base 3D coordinate system. In the second stage, precise system calibration is achieved through interactive manipulation of the projected EFOV polygons. The operator adjusts their position, scale, and rotation to align them with the floor plan or, in its absence, using virtual calibration elements projected onto all cameras in the system. This determines the remaining extrinsic parameters (camera position and yaw). Calibration requires only a static image from each camera, eliminating the need for physical access or synchronized video. The method is implemented as a practical web service. Comparative analysis and demonstration videos confirm the method's applicability, accuracy, and flexibility, enabling the deployment of precise multi-camera tracking systems in scenarios previously considered infeasible.
RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering
Zhang, Rongyang, Huang, Yuqing, Lu, Chengqiang, Wang, Qimeng, Gao, Yan, Wu, Yi, Hu, Yao, Xu, Yin, Wang, Wei, Wang, Hao, Chen, Enhong
In real-world scenarios, providing user queries with visually enhanced responses can considerably benefit understanding and memory, underscoring the great value of interleaved image-text generation. Despite recent progress, like the visual autoregressive model that unifies text and image processing in a single transformer architecture, generating high-quality interleaved content remains challenging. Moreover, evaluations of these interleaved sequences largely remain underexplored, with existing benchmarks often limited by unimodal metrics that inadequately assess the intricacies of combined image-text outputs. To address these issues, we present RAG-IGBench, a thorough benchmark designed specifically to evaluate the task of Interleaved Generation based on Retrieval-Augmented Generation (RAG-IG) in open-domain question answering. RAG-IG integrates multimodal large language models (MLLMs) with retrieval mechanisms, enabling the models to access external image-text information for generating coherent multimodal content. Distinct from previous datasets, RAG-IGBench draws on the latest publicly available content from social platforms and introduces innovative evaluation metrics that measure the quality of text and images, as well as their consistency. Through extensive experiments with state-of-the-art MLLMs (both open-source and proprietary) on RAG-IGBench, we provide an in-depth analysis examining the capabilities and limitations of these models. Additionally, we validate our evaluation metrics by demonstrating their high correlation with human assessments. Models fine-tuned on RAG-IGBench's training set exhibit improved performance across multiple benchmarks, confirming both the quality and practical utility of our dataset. Our benchmark is available at https://github.com/USTC-StarTeam/RAG-IGBench.
"Understanding the Science," by Camille Bordas
"Everyone thinks they're on this big now," Debbie said, refilling her glass. "I've had it with the journey. I've had it with you people." "I don't think I'm on a journey," Burt said. Life's too short to find out who we really are." It was the first time the six of them had got together for dinner in more than a year (since Maria's diagnosis), and after such a long time (and in celebration of Maria's remission) they'd expected to have more interesting things to tell one another, deeper things, but they were entering dessert territory now, a cake was on the table, and only superficial topics had been broached: Ervin's promotion, Jane and Burt's move to the suburbs, Katherine's recent purchase of a metabolism-tracking device--a pen-shaped item and the cause of Debbie's rant. "How much can you know about yourself, exactly?" she said. "The therapy, the vision quests, the birth charts--do we really need the data on metabolic flexibility, too?" Jane, in Katherine's defense, said that, the more you knew about yourself, the more useful you could be to society. Knowing whether Kat is in fat-or carb-burning mode doesn't help anyone." As a result of Katherine declining cake five minutes earlier, no one had touched it. No one, Debbie included, really wanted to. They'd all overeaten already, drunk too much, made private plans to atone for it the next day. The cake presented a challenge, it sat there taunting them, and Debbie knew this, that you couldn't serve cake to a group of fortysomethings without causing ripples, but what else could she have done? She got it, no one wanted to put on weight, but this was a gorgeous princess cake, just gorgeous, she'd had to drive all the way to Andersonville to get it from that Swedish bakery everyone talked about. Staring at it now, though, she wondered if the cake didn't look a little bit like a tit, the smooth half sphere, the small pink marzipan flower nippling the top of it--and, oh, God, did think it looked like a tit?
Japan aims to raise public AI use to 80%
Prime Minister Sanae Takaichi delivers a policy speech during a Lower House plenary session on Oct. 24. The government has drawn up a draft of its basic program on the development and use of artificial intelligence, aiming to increase the AI utilization rate among the public initially to 50% and eventually to 80%. The draft emphasizes the need to boost AI usage with a view to developing Japan's own AI technologies. It also seeks a policy to attract about ¥1 trillion ($6.4 billion) in private-sector investment to enhance research and development activities. The government aims to adopt the basic program at a Cabinet meeting by the end of the year.