AITopics

The NIME conference traditionally focuses on interfaces for music and musical expression. In this paper we reverse this tradition to ask, can interfaces developed for music be successfully appropriated to non-musical applications? To help answer this question we designed and developed a new device, which uses interface metaphors borrowed from analogue synthesisers and audio mixing to physically control the intangible aspects of a Large Language Model. We compared two versions of the device, with and without the audio-inspired augmentations, with a group of artists who used each version over a one week period. Our results show that the use of audio-like controls afforded more immediate, direct and embodied control over the LLM, allowing users to creatively experiment and play with the device over its non-mixer counterpart. Our project demonstrates how cross-sensory metaphors can support creative thinking and embodied practice when designing new technological interfaces.

artificial intelligence, large language model, natural language, (16 more...)

2504.13944

Country:

Oceania > Australia (0.70)
Europe > United Kingdom > England (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.68)
Information Technology (0.68)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Mozualization: Crafting Music and Visual Representation with Multimodal AI

Xu, Wanfang, Zhao, Lixiang, Song, Haiwen, Song, Xinheng, Lu, Zhaolin, Liu, Yu, Chen, Min, Lim, Eng Gee, Yu, Lingyun

In this work, we introduce Mozualization, a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs, such as keywords, images, and sound clips (e.g., segments from various pieces of music or even a playful cat's meow). Our work is inspired by the ways people express their emotions -- writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music. Building on this concept, we developed a tool that transforms these emotional expressions into a cohesive and expressive song, allowing users to seamlessly incorporate their unique preferences and inspirations. To evaluate the tool and, more importantly, gather insights for its improvement, we conducted a user study involving nine music enthusiasts. The study assessed user experience, engagement, and the impact of interacting with and listening to the generated music.

artificial intelligence, machine learning, natural language, (16 more...)

2504.13891

Country:

Asia (0.74)
North America > United States > Louisiana (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Data Science (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Driscoll, Phillip, Kumar, Priyanka

DoYouTrustAI: A Tool to Teach Students About AI Misinformation and Prompt Engineering

AI, especially Large Language Models (LLMs) like ChatGPT, have rapidly developed and gained widespread adoption in the past five years, shifting user preference from traditional search engines. However, the generative nature of LLMs raises concerns about presenting misinformation as fact. To address this, we developed a web-based application that helps K-12 students enhance critical thinking by identifying misleading information in LLM responses about major historical figures. In this paper, we describe the implementation and design details of the DoYouTrustAI tool, which can be used to provide an interactive lesson which teaches students about the dangers of misinformation and how believable generative AI can make it seem. The DoYouTrustAI tool utilizes prompt engineering to present the user with AI generated summaries about the life of a historical figure. These summaries can be either accurate accounts of that persons life, or an intentionally misleading alteration of their history. The user is tasked with determining the validity of the statement without external resources. Our research questions for this work were:(RQ1) How can we design a tool that teaches students about the dangers of misleading information and of how misinformation can present itself in LLM responses? (RQ2) Can we present prompt engineering as a topic that is easily understandable for students? Our findings highlight the need to correct misleading information before users retain it. Our tool lets users select familiar individuals for testing to reduce random guessing and presents misinformation alongside known facts to maintain believability. It also provides pre-configured prompt instructions to show how different prompts affect AI responses. Together, these features create a controlled environment where users learn the importance of verifying AI responses and understanding prompt engineering.

information, large language model, machine learning, (18 more...)

2504.13859

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Media > News (1.00)
Education > Educational Setting > K-12 Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Benchmarking Multi-National Value Alignment for Large Language Models

Shi, Weijie, Ju, Chengyi, Liu, Chengzhong, Ji, Jiaming, Zhang, Jipeng, Zhang, Ruiyuan, Zhu, Jia, Xu, Jiajie, Yang, Yaodong, Han, Sirui, Guo, Yike

Do Large Language Models (LLMs) hold positions that conflict with your country's values? Occasionally they do! However, existing works primarily focus on ethical reviews, failing to capture the diversity of national values, which encompass broader policy, legal, and moral considerations. Furthermore, current benchmarks that rely on spectrum tests using manually designed questionnaires are not easily scalable. To address these limitations, we introduce NaVAB, a comprehensive benchmark to evaluate the alignment of LLMs with the values of five major nations: China, the United States, the United Kingdom, France, and Germany. NaVAB implements a national value extraction pipeline to efficiently construct value assessment datasets. Specifically, we propose a modeling procedure with instruction tagging to process raw data sources, a screening process to filter value-related topics and a generation process with a Conflict Reduction mechanism to filter non-conflicting values.We conduct extensive experiments on various LLMs across countries, and the results provide insights into assisting in the identification of misaligned scenarios. Moreover, we demonstrate that NaVAB can be combined with alignment techniques to effectively reduce value concerns by aligning LLMs' values with the target country.

large language model, machine learning, natural language, (17 more...)

2504.12911

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Law (0.88)
Media > News (0.47)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Trokhymovych, Mykola, Kosovan, Oleksandr, Forrester, Nathan, Aragón, Pablo, Saez-Trumper, Diego, Baeza-Yates, Ricardo

Characterizing Knowledge Manipulation in a Russian Wikipedia Fork

Wikipedia is powered by MediaWiki, a free and open-source software that is also the infrastructure for many other wiki-based online encyclopedias. These include the recently launched website Ruwiki, which has copied and modified the original Russian Wikipedia content to conform to Russian law. To identify practices and narratives that could be associated with different forms of knowledge manipulation, this article presents an in-depth analysis of this Russian Wikipedia fork. We propose a methodology to characterize the main changes with respect to the original version. The foundation of this study is a comprehensive comparative analysis of more than 1.9M articles from Russian Wikipedia and its fork. Using meta-information and geographical, temporal, categorical, and textual features, we explore the changes made by Ruwiki editors. Furthermore, we present a classification of the main topics of knowledge manipulation in this fork, including a numerical estimation of their scope. This research not only sheds light on significant changes within Ruwiki, but also provides a methodology that could be applied to analyze other Wikipedia forks and similar collaborative projects.

artificial intelligence, large language model, natural language, (17 more...)

2504.10663

Country:

Asia > Russia (1.00)
Europe > Ukraine > Luhansk Oblast (0.46)
Europe > Ukraine > Donetsk Oblast (0.29)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Media (1.00)
Law (1.00)
Government > Military (0.94)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Elahimanesh, Sina, Mohammadkhani, Mohammadali, Kasaei, Shohreh

Emotion Alignment: Discovering the Gap Between Social Media and Real-World Sentiments in Persian Tweets and Images

In contemporary society, widespread social media usage is evident in people's daily lives. Nevertheless, disparities in emotional expressions between the real world and online platforms can manifest. We comprehensively analyzed Persian community on X to explore this phenomenon. An innovative pipeline was designed to measure the similarity between emotions in the real world compared to social media. Accordingly, recent tweets and images of participants were gathered and analyzed using Transformers-based text and image sentiment analysis modules. Each participant's friends also provided insights into the their real-world emotions. A distance criterion was used to compare real-world feelings with virtual experiences. Our study encompassed N=105 participants, 393 friends who contributed their perspectives, over 8,300 collected tweets, and 2,000 media images. Results indicated a 28.67% similarity between images and real-world emotions, while tweets exhibited a 75.88% alignment with real-world feelings. Additionally, the statistical significance confirmed that the observed disparities in sentiment proportions.

large language model, machine learning, natural language, (20 more...)

2504.10662

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Media (0.67)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Luo, Yuxuan, Rong, Zhengkun, Wang, Lizhen, Zhang, Longhao, Hu, Tianshu, Zhu, Yongming

While recent image-based human animation methods achieve realistic body and facial motion synthesis, critical gaps remain in fine-grained holistic controllability, multi-scale adaptability, and long-term temporal coherence, which leads to their lower expressiveness and robustness. We propose a diffusion transformer (DiT) based framework, DreamActor-M1, with hybrid guidance to overcome these limitations. For motion guidance, our hybrid control signals that integrate implicit facial representations, 3D head spheres, and 3D body skeletons achieve robust control of facial expressions and body movements, while producing expressive and identity-preserving animations. For scale adaptation, to handle various body poses and image scales ranging from portraits to full-body views, we employ a progressive training strategy using data with varying resolutions and scales. For appearance guidance, we integrate motion patterns from sequential frames with complementary visual references, ensuring long-term temporal coherence for unseen regions during complex movements. Experiments demonstrate that our method outperforms the state-of-the-art works, delivering expressive results for portraits, upper-body, and full-body generation with robust long-term consistency. Project Page: https://grisoon.github.io/DreamActor-M1/.

animation, artificial intelligence, machine learning, (14 more...)

2504.01724

Country: Asia (0.14)

Genre: Research Report (0.64)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Graphics > Animation (0.99)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives

Li, Shuyu, Ji, Shulei, Wang, Zihao, Wu, Songruoyao, Yu, Jiaxing, Zhang, Kejun

Multi-modal music generation, using multiple modalities like text, images, and video alongside musical scores and audio as guidance, is an emerging research area with broad applications. This paper reviews this field, categorizing music generation systems from the perspective of modalities. The review covers modality representation, multi-modal data alignment, and their utilization to guide music generation. Current datasets and evaluation methods are also discussed. Key challenges in this area include effective multi-modal integration, large-scale comprehensive datasets, and systematic evaluation methods. Finally, an outlook on future research directions is provided, focusing on creativity, efficiency, multi-modal alignment, and evaluation.

data mining, large language model, machine learning, (25 more...)

2504.00837

Country:

Europe (0.67)
North America > United States > Minnesota (0.27)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(9 more...)

EngadgetApr-21-2025, 20:17:19 GMT

Using generative AI will 'neither help nor harm the chances of achieving' Oscar nominations

The Academy of Motion Picture Arts and Sciences has decide that its official stance towards AI-use in films is to take no stance at all, according to a statement the organization shared outlining changes to voting for the 98th Oscars. The issue of award-nominated films using AI was first raised in 2024 when the productions behind Best Picture nominees The Brutalist and Emilia Pérez admitted to using the tech to alter performances. "With regard to Generative Artificial Intelligence and other digital tools used in the making of the film, the tools neither help nor harm the chances of achieving a nomination, " AMPAS writes. "The Academy and each branch will judge the achievement, taking into account the degree to which a human was at the heart of the creative authorship when choosing which movie to award." While the organization at least reaffirms that human involvement is their primary concern, they also don't seem to believe that using AI -- potentially trained on the ill-gotten work of their membership -- is an existential problem.

generative ai, machine learning, natural language, (6 more...)

Engadget

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.76)

The New YorkerApr-21-2025, 10:00:00 GMT

Subtitling Your Life

A little over thirty years ago, when he was in his mid-forties, my friend David Howorth lost all hearing in his left ear, a calamity known as single-sided deafness. "It happened literally overnight," he said. "My doctor told me, 'We really don't understand why.' " At the time, he was working as a litigator in the Portland, Oregon, office of a large law firm. His hearing loss had no impact on his job--"In a courtroom, you can get along fine with one ear"--but other parts of his life were upended. The brain pinpoints sound sources in part by analyzing minute differences between left-ear and right-ear arrival times, the same process that helps bats and owls find prey they can't see.

cochlear implant, hearing aids, howorth, (14 more...)

The New Yorker

Country:

North America > United States > Oregon > Multnomah County > Portland (0.24)
North America > United States > New York (0.04)
North America > United States > Connecticut > Hartford County > West Hartford (0.04)
(2 more...)

Industry:

Leisure & Entertainment (1.00)
Media (0.94)
Health & Medicine > Therapeutic Area > Otolaryngology (0.50)

Technology: Information Technology > Artificial Intelligence > Speech (0.47)