AITopics | Ju, Yaolong

Plotting

Ju, Yaolong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining

Tsoi, Tristan, Deng, Jiajun, Ju, Yaolong, Weck, Benno, Kirchhoff, Holger, Lui, Simon

arXiv.org Artificial IntelligenceMar-29-2025

--Music similarity retrieval is fundamental for managing and exploring relevant content from large collections in streaming platforms. This paper presents a novel cross-modal contrastive learning framework that leverages the open-ended nature of text descriptions to guide music similarity modeling, addressing the limitations of traditional uni-modal approaches in capturing complex musical relationships. T o overcome the scarcity of high-quality text-music paired data, this paper introduces a dual-source data acquisition approach combining online scraping and LLM-based prompting, where carefully designed prompts leverage LLMs' comprehensive music knowledge to generate contextually rich descriptions. Extensive experiments demonstrate that the proposed framework achieves significant performance improvements over existing benchmarks through objective metrics, subjective evaluations, and real-world A/B testing on the Huawei Music streaming platform. Music similarity retrieval plays an important role in many music information retrieval (MIR) tasks, such as music recommendation [1], personalized playlist generation [2] and background music replacement in video editing [3], [4]. As digital music collections rapidly expand within streaming platforms, accurately identifying similarities between musical pieces has become critical for managing and exploring relevant content from such large collections efficiently.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2503.23128

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation

Lin, Zhiwei, Chen, Jun, Tang, Boshi, Sha, Binzhu, Yang, Jing, Ju, Yaolong, Fan, Fan, Kang, Shiyin, Wu, Zhiyong, Meng, Helen

arXiv.org Artificial IntelligenceJan-15-2024

Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still remains unaddressed. To this end, we propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. The Multi-view MidiVAE utilizes the two-dimensional (2-D) representation, OctupleMIDI, to capture relationships among notes while reducing the feature sequences length. Moreover, we focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy to integrate both Track- and Bar-view MidiVAE features. Objective and subjective experimental results on the CocoChorales dataset demonstrate that, compared to the baseline, Multi-view MidiVAE exhibits significant improvements in terms of modeling long multi-track symbolic music.

artificial intelligence, machine learning, midivae, (17 more...)

arXiv.org Artificial Intelligence

2401.07532

Country: Asia > China (0.30)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback