music
BNMusic: Blending Environmental Noises into Personalized Music
While being disturbed by environmental noises, the acoustic masking technique is a conventional way to reduce the annoyance in audio engineering that seeks to cover up the noises with other dominant yet less intrusive sounds. However, misalignment between the dominant sound and the noise--such as mismatched downbeats--often requires an excessive volume increase to achieve effective masking. Motivated by recent advances in cross-modal generation, in this work, we introduce an alternative method to acoustic masking, aiming to reduce the noticeability of environmental noises by blending them into personalized music generated based on user-provided text prompts. Following the paradigm of music generation using mel-spectrogram representations, we propose a Blending Noises into Personalized Music (BNMusic) framework with two key stages.
MMAR: AChallenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
We introduce MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs) across massive multi-disciplinary tasks. MMAR comprises 1,000 meticulously curated audio-question-answer triplets, collected from real-world internet videos and refined through iterative error corrections and quality checks to ensure high quality. Unlike existing benchmarks that are limited to specific domains of sound, music, or speech, MMAR extends them to a broad spectrum of real-world audio scenarios, including mixedmodality combinations of sound, music, and speech. Each question in MMAR is hierarchically categorized across four reasoning layers: Signal, Perception, Semantic, and Cultural, with additional sub-categories within each layer to reflect task diversity and complexity. To further foster research in this area, we annotate every question with a Chain-of-Thought (CoT) rationale to promote future advancements in audio reasoning.
Qobuz Is the Anti-Spotify Music Streamer You've Been Waiting For
Qobuz Is the Anti-Spotify Music Streamer You've Been Waiting For With its music focus, no-AI content policy, and larger artist royalties, the hi-res streaming service is scooping up all sorts of switchers. When Dan Mackta, Qobuz's New York-based managing director, was looking for musicians to endorse the music streaming service after its US launch in 2019, he tapped up a friend--the manager of the Flaming Lips. It was mid-pandemic levels of tricky. "I flew to Oklahoma to shoot with Wayne Coyne," Mackta says. "He shows up wearing one of those helmets, with the ventilation system to protect you, a metallic puffer jacket and big silver moon boots."
OmniBench: Towards The Future of Universal Omni-Language Models
Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities. However, the capacity of these models to concurrently process and reason about multiple modalities remains underexplored, partly due to the lack of comprehensive modality-wise benchmarks. We introduce OmniBench, a novel benchmark designed to rigorously evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define language models capable of such tri-modal processing as the omni-language models (OLMs). OmniBench is distinguished by high-quality human annotations, ensuring that accurate responses require integrated understanding and reasoning across all three modalities.
Investigation by The Atlantic reveals many millions of songs used for AI music training
Taylor Swift, Bad Bunny and many, many more artists have had their work fed into AI models. We're always glad to see more publications and groups digging deeper into artificial intelligence and its impact. Today, has published four searchable databases of music that has been used to train AI models. The scope is pretty staggering, with 12 million tracks in one database, 9 million in another, and the two final ones each containing about 100,000 songs. The full results and payout from that suit are still pending, though the initial settlement was for $1.5 billion.
TCL A65K Soundbar Review: Small Size, Big Sound
Don't be fooled by the compact size of this soundbar. It's a solid option for smaller TVs or spaces without having to sacrifice sound quality. Acoustic music sounds loud and distinct. Some music sounds washed out and muddy. Living in a small space has some challenges, but poor cinematic sound doesn't need to be one of them.
Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
We present a unified framework for automatic multitrack music arrangement that enables a single pre-trained symbolic music model to handle diverse arrangement scenarios, including reinterpretation, simplification, and additive generation. At its core is a segment-level reconstruction objective operating on token-level disentangled content and style, allowing for flexible any-to-any instrumentation transformations at inference time. To support track-wise modeling, we introduce REMI-z, a structured tokenization scheme for multitrack symbolic music that enhances modeling efficiency and effectiveness for both arrangement tasks and unconditional generation. Our method outperforms task-specific state-of-the-art models on representative tasks in different arrangement scenarios--band arrangement, piano reduction, and drum arrangement, in both objective metrics and perceptual evaluations. Taken together, our framework demonstrates strong generality and suggests broader applicability in symbolic music-to-music transformation.1
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
We introduce MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs) across massive multi-disciplinary tasks. MMAR comprises 1,000 meticulously curated audio-question-answer triplets, collected from real-world internet videos and refined through iterative error corrections and quality checks to ensure high quality. Unlike existing benchmarks that are limited to specific domains of sound, music, or speech, MMAR extends them to a broad spectrum of real-world audio scenarios, including mixed-modality combinations of sound, music, and speech. Each question in MMAR is hierarchically categorized across four reasoning layers: Signal, Perception, Semantic, and Cultural, with additional sub-categories within each layer to reflect task diversity and complexity. To further foster research in this area, we annotate every question with a Chain-of-Thought (CoT) rationale to promote future advancements in audio reasoning.