AITopics

Wang, George Xi, Deng, Jingying, Ali, Safinah

Evaluating the Impact of AI-Powered Audiovisual Personalization on Learner Emotion, Focus, and Learning Outcomes

Independent learners often struggle with sustaining focus and emotional regulation in unstructured or distracting settings. Although some rely on ambient aids such as music, ASMR, or visual backgrounds to support concentration, these tools are rarely integrated into cohesive, learner-centered systems. Moreover, existing educational technologies focus primarily on content adaptation and feedback, overlooking the emotional and sensory context in which learning takes place. Large language models have demonstrated powerful multimodal capabilities including the ability to generate and adapt text, audio, and visual content. Educational research has yet to fully explore their potential in creating personalized audiovisual learning environments. To address this gap, we introduce an AI-powered system that uses LLMs to generate personalized multisensory study environments. Users select or generate customized visual themes (e.g., abstract vs. realistic, static vs. animated) and auditory elements (e.g., white noise, ambient ASMR, familiar vs. novel sounds) to create immersive settings aimed at reducing distraction and enhancing emotional stability. Our primary research question investigates how combinations of personalized audiovisual elements affect learner cognitive load and engagement. Using a mixed-methods design that incorporates biometric measures and performance outcomes, this study evaluates the effectiveness of LLM-driven sensory personalization. The findings aim to advance emotionally responsive educational technologies and extend the application of multimodal LLMs into the sensory dimension of self-directed learning.

large language model, machine learning, natural language, (15 more...)

2505.03033

Country: North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report > Experimental Study (0.34)

Industry:

Education > Educational Technology (0.87)
Media > Music (0.69)
Information Technology > Security & Privacy (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.82)

Torres, Bernardo, Peeters, Geoffroy, Richard, Gael

The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

arXiv.org Machine LearningMay-7-2025

--We present the Inverse Drum Machine (IDM), a novel approach to Drum Source Separation that leverages an analysis-by-synthesis framework combined with deep learning. Unlike recent supervised methods that require isolated stem recordings, our approach operates on drum mixtures with only transcription annotations. IDM integrates Automatic Drum Transcription and One-shot drum Sample Synthesis, jointly optimizing these tasks in an end-to-end manner . By convolving synthesized one-shot samples with estimated onsets, akin to a drum machine, we reconstruct the individual drum stems and train a Deep Neural Network on the reconstruction of the mixture. Experiments on the StemGMD dataset demonstrate that IDM achieves separation quality comparable to state-of-the-art supervised methods that require isolated stems data, while significantly outperforming matrix decomposition baselines. N Western popular music, the rhythmic foundation typically relies on percussion instruments from a standard drum kit comprising kick drum, snare drum, and hi-hat, while additional elements such as cymbals, tom-toms, and auxiliary percussions provide timbral complexity and rhythmic variation. Music producers and engineers often need to adjust individual drum instruments separately for remixing, rebalanc-ing, effects processing, or creating educational materials [1], [2]. Ideally, music production would utilize isolated recordings of each drum instrument (known as "stems"), allowing for precise control during mixing. However, these instruments are usually played simultaneously and by the same performer, resulting in recordings in which all elements are mixed into a single audio stream. Obtaining these separated stems during recording requires multiple microphones (leading to microphone bleeding) or asking musicians to play in unnatural conditions [3]. The need for tools that can extract individual drum stems from already mixed recordings has led to growing interest in Drum Source Separation (DSS). These solutions, however, are proprietary and still have limitations in separation quality and flexibility. DSS is challenging due to the acoustic properties of percussion sounds.

artificial intelligence, instrument, machine learning, (16 more...)

arXiv.org Machine Learning

2505.03337

Country:

Europe > Austria > Vienna (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
(17 more...)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Leisure & Entertainment (0.86)
Media > Music (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Marey, Lilian, Laclau, Charlotte, Sguerra, Bruno, Viard, Tiphaine, Moussallam, Manuel

Modeling Musical Genre Trajectories through Pathlet Learning

The increasing availability of user data on music streaming platforms opens up new possibilities for analyzing music consumption. However, understanding the evolution of user preferences remains a complex challenge, particularly as their musical tastes change over time. This paper uses the dictionary learning paradigm to model user trajectories across different musical genres. We define a new framework that captures recurring patterns in genre trajectories, called pathlets, enabling the creation of comprehensible trajectory embeddings. We show that pathlet learning reveals relevant listening patterns that can be analyzed both qualitatively and quantitatively. This work improves our understanding of users' interactions with music and opens up avenues of research into user behavior and fostering diversity in recommender systems. A dataset of 2000 user histories tagged by genre over 17 months, supplied by Deezer (a leading music streaming company), is also released with the code.

data mining, machine learning, trajectory, (18 more...)

doi: 10.1145/3708319.3733695

2505.0348

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Zhang, Jincheng, Fazekas, György, Saitis, Charalampos

Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation

Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation 1 st Jincheng Zhang Centre for Digital Music Queen Mary University of London London, UK jincheng.zhang@qmul.ac.uk 2 nd Gy orgy Fazekas Centre for Digital Music Queen Mary University of London London, UK george.fazekas@qmul.ac.uk 3 rd Charalampos Saitis Centre for Digital Music Queen Mary University of London London, UK c.saitis@qmul.ac.uk Abstract --The recent surge in the popularity of diffusion models for image synthesis has attracted new attention to their potential for generation tasks in other domains. However, their applications to symbolic music generation remain largely under-explored because symbolic music is typically represented as sequences of discrete events and standard diffusion models are not well-suited for discrete data. We represent symbolic music as image-like pi-anorolls, facilitating the use of diffusion models for the generation of symbolic music. Moreover, this study introduces a novel diffusion model that incorporates our proposed Transformer-Mamba block and learnable wavelet transform. Classifier-free guidance is utilised to generate symbolic music with target chords. Our evaluation shows that our method achieves compelling results in terms of music quality and controllability, outperforming the strong baseline in pianoroll generation. Index T erms --symbolic music generation, deep learning, diffusion models, wavelet transform, Mamba I.

artificial intelligence, diffusion model, machine learning, (18 more...)

2505.03314

Country: Europe > United Kingdom > England > Greater London > London (0.65)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Tari, Henry, Sereiva, Nojus, Kaushal, Rishabh, Bertaglia, Thales, Iamnitchi, Adriana

Towards High-Fidelity Synthetic Multi-platform Social Media Datasets via Large Language Models

Social media datasets are essential for research on a variety of topics, such as disinformation, influence operations, hate speech detection, or influencer marketing practices. However, access to social media datasets is often constrained due to costs and platform restrictions. Acquiring datasets that span multiple platforms, which is crucial for understanding the digital ecosystem, is particularly challenging. This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms, aiming to match the quality of real data. We propose multi-platform topic-based prompting and employ various language models to generate synthetic data from two real datasets, each consisting of posts from three different social media platforms. We assess the lexical and semantic properties of the synthetic data and compare them with those of the real data. Our empirical findings show that using large language models to generate synthetic multi-platform social media data is promising, different language models perform differently in terms of fidelity, and a post-processing approach might be needed for generating high-fidelity synthetic datasets for research. In addition to the empirical evaluation of three state of the art large language models, our contributions include new fidelity metrics specific to multi-platform social media datasets.

large language model, machine learning, natural language, (20 more...)

2505.02858

Country:

Europe (1.00)
North America > United States (0.47)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Services (1.00)
Health & Medicine (1.00)
Government > Voting & Elections (0.94)
Media > News (0.91)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

MIT Technology ReviewMay-6-2025, 12:10:00 GMT

The Download: a longevity influencer's new religion, and humanoid robots' shortcomings

Bryan Johnson is on a mission to not die. The 47-year-old multimillionaire has already applied his slogan "Don't Die" to events, merchandise, and a Netflix documentary. Now he's founding a Don't Die religion. Johnson, who famously spends millions of dollars on scans, tests, supplements, and a lifestyle routine designed to slow or reverse the aging process, has enjoyed extensive media coverage, and a huge social media following. For many people, he has become the face of the longevity field.

artificial intelligence, humanoid robot, longevity influencer, (6 more...)

MIT Technology Review

Country: North America > United States > California > Alameda County > Berkeley (0.07)

Industry: Media (0.84)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.43)

Daily Mail - Science & techMay-6-2025, 10:51:12 GMT

Robot DOG makes an appearance at the Met Gala - dressed in a tuxedo and adorned with a 1,000-carat diamond leash

At New York's Met Gala, guests are known for attention-grabbing outfits, from Katy Perry's human chandelier dress to Kim Kardashian's all-black body suit. But one attendant in particular has stolen the limelight this year – and he's not even human. Indian-American entrepreneur Mona Patel rocked up to the annual event on Monday night with an adorable robotic dachshund in tow. Vector the robo-dog, developed by scientists at MIT, has a 1,000-carat diamond-studded leash and his own cute little specially-fitted tuxedo. Powered by AI and equipped with sensors, Vector has customised movement patterns and'just the right amount of sass', Vogue India reports.

artificial intelligence, met gala, patel, (14 more...)

Daily Mail - Science & tech

Country:

North America > United States > New York (0.27)
North America > United States > Massachusetts (0.06)
North America > United States > Texas > Dallas County > Dallas (0.05)
Asia > India > Gujarat > Vadodara (0.05)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.71)
Health & Medicine > Diagnostic Medicine > Imaging (0.30)

Technology: Information Technology > Artificial Intelligence > Robots (0.76)

arXiv.org Artificial IntelligenceMay-6-2025

Predicting Movie Hits Before They Happen with LLMs

Agah, Shaghayegh, Kim, Yejin, Sharma, Neeraj, Nankani, Mayur, Foley, Kevin, Huang, H. Howie, Hamidian, Sardar

Addressing the cold-start issue in content recommendation remains a critical ongoing challenge. In this work, we focus on tackling the cold-start problem for movies on a large entertainment platform. Our primary goal is to forecast the popularity of cold-start movies using Large Language Models (LLMs) leveraging movie metadata. This method could be integrated into retrieval systems within the personalization pipeline or could be adopted as a tool for editorial teams to ensure fair promotion of potentially overlooked movies that may be missed by traditional or algorithmic solutions. Our study validates the effectiveness of this approach compared to established baselines and those we developed.

large language model, machine learning, natural language, (18 more...)

2505.02693

Country: North America > United States > New York (0.23)

Genre: Research Report (0.83)

Industry:

Leisure & Entertainment (0.94)
Media > Film (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceMay-6-2025

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Li, Ming, Gu, Xin, Chen, Fan, Xing, Xiaoying, Wen, Longyin, Chen, Chen, Zhu, Sijie

Due to the challenges of manually collecting accurate editing data, existing datasets are typically constructed using various automated methods, leading to noisy supervision signals caused by the mismatch between editing instructions and original-edited image pairs. Recent efforts attempt to improve editing models through generating higher-quality edited images, pre-training on recognition tasks, or introducing vision-language models (VLMs) but fail to resolve this fundamental issue. In this paper, we offer a novel solution by constructing more effective editing instructions for given image pairs. This includes rectifying the editing instructions to better align with the original-edited image pairs and using contrastive editing instructions to further enhance their effectiveness. Specifically, we find that editing models exhibit specific generation attributes at different inference steps, independent of the text. Based on these prior attributes, we define a unified guide for VLMs to rectify editing instructions. However, there are some challenging editing scenarios that cannot be resolved solely with rectified instructions. To this end, we further construct contrastive supervision signals with positive and negative instructions and introduce them into the model training using triplet loss, thereby further facilitating supervision effectiveness. Our method does not require the VLM modules or pre-training tasks used in previous work, offering a more direct and efficient way to provide better supervision signals, and providing a novel, simple, and effective solution for instruction-based image editing. Results on multiple benchmarks demonstrate that our method significantly outperforms existing approaches. Compared with previous SOTA SmartEdit, we achieve 9.19% improvements on the Real-Edit benchmark with 30x less training data and 13x smaller model size.

large language model, machine learning, natural language, (18 more...)

2505.0237

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Media > Photography (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)