Media
SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages
Kautsar, Muhammad Dehan Al, Candra, Aswin, Hakim, Muhammad Alif Al, Kahfi, Maxalmina Satria, Koto, Fajri, Aji, Alham Fikri, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Winata, Genta Indra
Although numerous datasets have been developed to support dialogue systems, most existing chit-chat datasets overlook the cultural nuances inherent in natural human conversations. To address this gap, we introduce SEADialogues, a culturally grounded dialogue dataset centered on Southeast Asia, a region with over 700 million people and immense cultural diversity. Our dataset features dialogues in eight languages from six Southeast Asian countries, many of which are low-resource despite having sizable speaker populations. To enhance cultural relevance and personalization, each dialogue includes persona attributes and two culturally grounded topics that reflect everyday life in the respective communities. Furthermore, we release a multi-turn dialogue dataset to advance research on culturally aware and human-centric large language models, including conversational dialogue agents.
Balancing Privacy and Efficiency: Music Information Retrieval via Additive Homomorphic Encryption
Wang, William Zerong, Zhao, Dongfang
In the era of generative AI, ensuring the privacy of music data presents unique challenges: unlike static artworks such as images, music data is inherently temporal and multimodal, and it is sampled, transformed, and remixed at an unprecedented scale. These characteristics make its core vector embeddings, i.e, the numerical representations of the music, highly susceptible to being learned, misused, or even stolen by models without accessing the original audio files. Traditional methods like copyright licensing and digital watermarking offer limited protection for these abstract mathematical representations, thus necessitating a stronger, e.g., cryptographic, approach to safeguarding the embeddings themselves. Standard encryption schemes, such as AES, render data unintelligible for computation, making such searches impossible. While Fully Homomorphic Encryption (FHE) provides a plausible solution by allowing arbitrary computations on ciphertexts, its substantial performance overhead remains impractical for large-scale vector similarity searches. Given this trade-off, we propose a more practical approach using Additive Homomorphic Encryption (AHE) for vector similarity search. The primary contributions of this paper are threefold: we analyze threat models unique to music information retrieval systems; we provide a theoretical analysis and propose an efficient AHE-based solution through inner products of music embeddings to deliver privacy-preserving similarity search; and finally, we demonstrate the efficiency and practicality of the proposed approach through empirical evaluation and comparison to FHE schemes on real-world MP3 files.
Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection
Li, Siyuan, Lin, Xi, Li, Guangyan, Liu, Zehao, Wulianghai, Aodu, Ding, Li, Wu, Jun, Li, Jianhua
The rapid advancement of large language models (LLMs) has resulted in increasingly sophisticated AI-generated content, posing significant challenges in distinguishing LLM-generated text from human-written language. Existing detection methods, primarily based on lexical heuristics or fine-tuned classifiers, often suffer from limited generalizability and are vulnerable to paraphrasing, adversarial perturbations, and cross-domain shifts. In this work, we propose SentiDetect, a model-agnostic framework for detecting LLM-generated text by analyzing the divergence in sentiment distribution stability. Our method is motivated by the empirical observation that LLM outputs tend to exhibit emotionally consistent patterns, whereas human-written texts display greater emotional variability. To capture this phenomenon, we define two complementary metrics: sentiment distribution consistency and sentiment distribution preservation, which quantify stability under sentiment-altering and semantic-preserving transformations. We evaluate SentiDetect on five diverse datasets and a range of advanced LLMs,including Gemini-1.5-Pro, Claude-3, GPT-4-0613, and LLaMa-3.3. Experimental results demonstrate its superiority over state-of-the-art baselines, with over 16% and 11% F1 score improvements on Gemini-1.5-Pro and GPT-4-0613, respectively. Moreover, SentiDetect also shows greater robustness to paraphrasing, adversarial attacks, and text length variations, outperforming existing detectors in challenging scenarios.
Highlight All the Phrases: Enhancing LLM Transparency through Visual Factuality Indicators
Do, Hyo Jin, Ostrand, Rachel, Geyer, Werner, Murugesan, Keerthiram, Wei, Dennis, Weisz, Justin
Large language models (LLMs) are susceptible to generating inaccurate or false information, often referred to as "hallucinations" or "confabulations." While several technical advancements have been made to detect hallucinated content by assessing the factuality of the model's responses, there is still limited research on how to effectively communicate this information to users. To address this gap, we conducted two scenario-based experiments with a total of 208 participants to systematically compare the effects of various design strategies for communicating factuality scores by assessing participants' ratings of trust, ease in validating response accuracy, and preference. Our findings reveal that participants preferred and trusted a design in which all phrases within a response were color-coded based on factuality scores. Participants also found it easier to validate accuracy of the response in this style compared to a baseline with no style applied. Our study offers practical design guidelines for LLM application developers and designers, aimed at calibrating user trust, aligning with user preferences, and enhancing users' ability to scrutinize LLM outputs.
Story Ribbons: Reimagining Storyline Visualizations with Large Language Models
Yeh, Catherine, Menon, Tara, Arya, Robin Singh, He, Helen, Weigel, Moira, Viรฉgas, Fernanda, Wattenberg, Martin
Analyzing literature involves tracking interactions between characters, locations, and themes. Visualization has the potential to facilitate the mapping and analysis of these complex relationships, but capturing structured information from unstructured story data remains a challenge. As large language models (LLMs) continue to advance, we see an opportunity to use their text processing and analysis capabilities to augment and reimagine existing storyline visualization techniques. Toward this goal, we introduce an LLM-driven data parsing pipeline that automatically extracts relevant narrative information from novels and scripts. We then apply this pipeline to create Story Ribbons, an interactive visualization system that helps novice and expert literary analysts explore detailed character and theme trajectories at multiple narrative levels. Through pipeline evaluations and user studies with Story Ribbons on 36 literary works, we demonstrate the potential of LLMs to streamline narrative visualization creation and reveal new insights about familiar stories. We also describe current limitations of AI-based systems, and interaction motifs designed to address these issues.
Large Language Models for Oral History Understanding with Text Classification and Sentiment Analysis
Cherukuri, Komala Subramanyam, Moses, Pranav Abishai, Sakata, Aisa, Chen, Jiangping, Chen, Haihua
Oral histories are vital records of lived experience, particularly within communities affected by systemic injustice and historical erasure. Effective and efficient analysis of their oral history archives can promote access and understanding of the oral histories. However, Large-scale analysis of these archives remains limited due to their unstructured format, emotional complexity, and high annotation costs. This paper presents a scalable framework to automate semantic and sentiment annotation for Japanese American Incarceration Oral History. Using LLMs, we construct a high-quality dataset, evaluate multiple models, and test prompt engineering strategies in historically sensitive contexts. Our multiphase approach combines expert annotation, prompt design, and LLM evaluation with ChatGPT, Llama, and Qwen. We labeled 558 sentences from 15 narrators for sentiment and semantic classification, then evaluated zero-shot, few-shot, and RAG strategies. For semantic classification, ChatGPT achieved the highest F1 score (88.71%), followed by Llama (84.99%) and Qwen (83.72%). For sentiment analysis, Llama slightly outperformed Qwen (82.66%) and ChatGPT (82.29%), with all models showing comparable results. The best prompt configurations were used to annotate 92,191 sentences from 1,002 interviews in the JAIOH collection. Our findings show that LLMs can effectively perform semantic and sentiment annotation across large oral history collections when guided by well-designed prompts. This study provides a reusable annotation pipeline and practical guidance for applying LLMs in culturally sensitive archival analysis. By bridging archival ethics with scalable NLP techniques, this work lays the groundwork for responsible use of artificial intelligence in digital humanities and preservation of collective memory. GitHub: https://github.com/kc6699c/LLM4OralHistoryAnalysis.
Semi-automated Fact-checking in Portuguese: Corpora Enrichment using Retrieval with Claim extraction
Gomes, Juliana Resplande Sant'anna, Filho, Arlindo Rodrigues Galvรฃo
The accelerated dissemination of disinformation often outpaces the capacity for manual fact-checking, highlighting the urgent need for Semi-Automated Fact-Checking (SAFC) systems. Within the Portuguese language context, there is a noted scarcity of publicly available datasets ( corpora) that integrate external evidence, an essential component for developing robust AFC systems, as many existing resources focus solely on classification based on intrinsic text features. This dissertation addresses this gap by developing, applying, and analyzing a methodology to enrich Portuguese news corpora (Fake.Br, COVID19.BR, MuMiN-PT) with external evidence. The approach simulates a user's verification process, employing Large Language Models (LLMs, specifically Gemini 1.5 Flash) to extract the main claim from texts and search engine APIs (Google Search API, Google FactCheck Claims Search API) to retrieve relevant external documents (evidence). Additionally, a data validation and pre-processing framework, including near-duplicate detection, is introduced to enhance the quality of the base corpora. The main results demonstrate the methodology's viability, providing enriched corpora and analyses that confirm the utility of claim extraction, the influence of original data characteristics on the process, and the positive impact of enrichment on the performance of classification models (Bertimbau and Gemini 1.5 Flash), especially with fine-tuning. This work contributes valuable resources and insights for advancing SAFC in Portuguese.
Recommendation with Generative Models
Deldjoo, Yashar, He, Zhankui, McAuley, Julian, Korikov, Anton, Sanner, Scott, Ramisa, Arnau, Vidal, Rene, Sathiamoorthy, Maheswaran, Kasrizadeh, Atoosa, Milano, Silvia, Ricci, Francesco
Generative models are a class of AI models capable of creating new instances of data by learning and sampling from their statistical distributions. In recent years, these models have gained prominence in machine learning due to the development of approaches such as generative adversarial networks (GANs), variational autoencoders (VAEs), and transformer-based architectures such as GPT. These models have applications across various domains, such as image generation, text synthesis, and music composition. In recommender systems, generative models, referred to as Gen-RecSys, improve the accuracy and diversity of recommendations by generating structured outputs, text-based interactions, and multimedia content. By leveraging these capabilities, Gen-RecSys can produce more personalized, engaging, and dynamic user experiences, expanding the role of AI in eCommerce, media, and beyond. Our book goes beyond existing literature by offering a comprehensive understanding of generative models and their applications, with a special focus on deep generative models (DGMs) and their classification. We introduce a taxonomy that categorizes DGMs into three types: ID-driven models, large language models (LLMs), and multimodal models. Each category addresses unique technical and architectural advancements within its respective research area. This taxonomy allows researchers to easily navigate developments in Gen-RecSys across domains such as conversational AI and multimodal content generation. Additionally, we examine the impact and potential risks of generative models, emphasizing the importance of robust evaluation frameworks.
In 'Alien: Earth', the Future Is a Corporate Hellscape
Seventeen years ago, Noah Hawley became a father during the Great Recession. If you look at everything he's written since having children--including the TV series Fargo and Legion--Hawley says it all revolves around the same question every parent faces: "How are we supposed to raise these people in the world that we're living in?" Hawley's new series, Alien: Earth, which premieres August 12 on Hulu and FX, explores this question even more directly than his previous work. Set two years before the original Alien in 2120, it imagines a future where the race for immortality has led to three competing technologies: synths (AI minds in synthetic bodies), cyborgs (humans with cybernetic enhancements), and hybrids (human minds downloaded into synthetic bodies). When a deep space research vessel, the USCSS Maginot, crashes into Earth carrying five captured alien species, a megacorporation called Prodigy sends six hybrids to investigate. The first-ever hybrid, Wendy, played by Sydney Chandler, was a terminally ill child before she was selected for the immortality experiment, just like the rest of Prodigy's hybrids, all six of whom wake up in super-strong, super-fast, synthetic adult bodies that will never age.
Cat soap operas and babies trapped in space: the 'AI slop' taking over YouTube
Babies trapped in space, zombie football stars and cat soap operas: welcome to YouTube in the era of AI video. Nearly one in 10 of the fastest growing YouTube channels globally are showing AI-generated content only, as breakthroughs in the technology spur a flood of artificial content. Guardian analysis of data from the analytics firm Playboard shows that out of the top 100 fastest growing channels in July this year, nine were showing purely AI-generated content. The offerings include channels featuring bizarre narratives such as a baby crawling into a pre-launch space rocket, an undead Cristiano Ronaldo and melodramas featuring humanised cats. AI video generation has surged amid the release of powerful tools such as Google's Veo 3 and Elon Musk's Grok Imagine.