AITopics | Mao, Zhuoyuan

Collaborating Authors

Mao, Zhuoyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-Modal Learning for Music-to-Music-Video Description Generation

Mao, Zhuoyuan, Zhao, Mengjie, Wu, Qiyu, Zhong, Zhi, Liao, Wei-Hsiang, Wakaki, Hiromi, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceMar-14-2025

Music-to-music-video generation is a challenging task due to the intrinsic differences between the music and video modalities. The advent of powerful text-to-video diffusion models has opened a promising pathway for music-video (MV) generation by first addressing the music-to-MV description task and subsequently leveraging these models for video generation. In this study, we focus on the MV description generation task and propose a comprehensive pipeline encompassing training data construction and multimodal model fine-tuning. We fine-tune existing pre-trained multimodal models on our newly constructed music-to-MV description dataset based on the Music4All dataset, which integrates both musical and visual information. Our experimental results demonstrate that music representations can be effectively mapped to textual domains, enabling the generation of meaningful MV description directly from music inputs. We also identify key components in the dataset construction pipeline that critically impact the quality of MV description and highlight specific musical attributes that warrant greater focus for improved MV description generation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.1119

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

Add feedback

DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning

Mao, Zhuoyuan, Zhao, Mengjie, Wu, Qiyu, Wakaki, Hiromi, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceFeb-18-2025

Recent advancements in music large language models (LLMs) have significantly improved music understanding tasks, which involve the model's ability to analyze and interpret various musical elements. These improvements primarily focused on integrating both music and text inputs. However, the potential of incorporating additional modalities such as images, videos and textual music features to enhance music understanding remains unexplored. To bridge this gap, we propose DeepResonance, a multimodal music understanding LLM fine-tuned via multi-way instruction tuning with multi-way aligned music, text, image, and video data. To this end, we construct Music4way-MI2T, Music4way-MV2T, and Music4way-Any2T, three 4-way training and evaluation datasets designed to enable DeepResonance to integrate both visual and textual music feature content. We also introduce multi-sampled ImageBind embeddings and a pre-alignment Transformer to enhance modality fusion prior to input into text LLMs, tailoring DeepResonance for multi-way instruction tuning. Our model achieves state-of-the-art performances across six music understanding tasks, highlighting the benefits of the auxiliary modalities and the structural superiority of DeepResonance. We plan to open-source the models and the newly constructed datasets.

artificial intelligence, large language model, music-centric multi-way instruction tuning, (2 more...)

arXiv.org Artificial Intelligence

2502.12623

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

OpenMU: Your Swiss Army Knife for Music Understanding

Zhao, Mengjie, Zhong, Zhi, Mao, Zhuoyuan, Yang, Shiqi, Liao, Wei-Hsiang, Takahashi, Shusuke, Wakaki, Hiromi, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceNov-27-2024

We present OpenMU-Bench, a large-scale benchmark suite for addressing the data scarcity issue in training multimodal language models to understand music. To construct OpenMU-Bench, we leveraged existing datasets and bootstrapped new annotations. OpenMU-Bench also broadens the scope of music understanding by including lyrics understanding and music tool usage. Using OpenMU-Bench, we trained our music understanding model, OpenMU, with extensive ablations, demonstrating that OpenMU outperforms baseline models such as MU-Llama. Both OpenMU and OpenMU-Bench are open-sourced to facilitate future research in music understanding and to enhance creative music production efficiency.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.15573

Country:

Europe (0.67)
North America > United States > Pennsylvania (0.14)
North America > United States > Michigan (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning

Mao, Zhuoyuan, Chu, Chenhui, Kurohashi, Sadao

arXiv.org Artificial IntelligenceMay-30-2024

Massively multilingual sentence representation models, e.g., LASER, SBERT-distill, and LaBSE, help significantly improve cross-lingual downstream tasks. However, the use of a large amount of data or inefficient model architectures results in heavy computation to train a new model according to our preferred languages and domains. To resolve this issue, we introduce efficient and effective massively multilingual sentence embedding (EMS), using cross-lingual token-level reconstruction (XTR) and sentence-level contrastive learning as training objectives. Compared with related studies, the proposed model can be efficiently trained using significantly fewer parallel sentences and GPU computation resources. Empirical results showed that the proposed model significantly yields better or comparable results with regard to cross-lingual sentence retrieval, zero-shot cross-lingual genre classification, and sentiment classification. Ablative analyses demonstrated the efficiency and effectiveness of each component of the proposed model. We release the codes for model training and the EMS pre-trained sentence embedding model, which supports 62 languages ( https://github.com/Mao-KU/EMS ).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TASLP.2024.3402064

2205.15744

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

Mao, Zhuoyuan, Yu, Yen

arXiv.org Artificial IntelligenceJan-11-2024

This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on large language models (LLMs). One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a straightforward approach to the first challenge. However, MTInstruct is limited by weak cross-lingual signals inherent in the second challenge. AlignInstruct emphasizes cross-lingual supervision via a cross-lingual discriminator built using statistical word alignments. Our results based on fine-tuning the BLOOMZ models (1b1, 3b, and 7b1) in up to 24 unseen languages showed that: (1) LLMs can effectively translate unseen languages using MTInstruct; (2) AlignInstruct led to consistent improvements in translation quality across 48 translation directions involving English; (3) Discriminator-based instructions outperformed their generative counterparts as cross-lingual instructions; (4) AlignInstruct improved performance in 30 zero-shot directions.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2401.05811

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > Canada > Quebec (0.14)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation

Mao, Zhuoyuan, Nakagawa, Tetsuji

arXiv.org Artificial IntelligenceDec-26-2023

Large-scale language-agnostic sentence embedding models such as LaBSE (Feng et al., 2022) obtain state-of-the-art performance for parallel sentence alignment. However, these large-scale models can suffer from inference speed and computation overhead. This study systematically explores learning language-agnostic sentence embeddings with lightweight models. We demonstrate that a thin-deep encoder can construct robust low-dimensional sentence embeddings for 109 languages. With our proposed distillation methods, we achieve further improvements by incorporating knowledge from a teacher model. Empirical results on Tatoeba, United Nations, and BUCC show the effectiveness of our lightweight models. We release our lightweight language-agnostic sentence embedding models LEALLA on TensorFlow Hub.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.08387

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

Wan, Zhen, Cheng, Fei, Mao, Zhuoyuan, Liu, Qianying, Song, Haiyue, Li, Jiwei, Kurohashi, Sadao

arXiv.org Artificial IntelligenceDec-8-2023

In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the strong inclination to wrongly classify NULL examples into other pre-defined labels. In this paper, we propose GPT-RE to bridge the gap between LLMs and fully-supervised baselines. GPT-RE successfully addresses the aforementioned issues by (1) incorporating task-specific entity representations in demonstration retrieval; and (2) enriching the demonstrations with gold label-induced reasoning logic. We evaluate GPT-RE on four widely-used RE datasets, and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines. Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

demonstration, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.02105

Country:

Europe (1.00)
Asia (0.94)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

Mao, Zhuoyuan, Song, Haiyue, Dabre, Raj, Chu, Chenhui, Kurohashi, Sadao

arXiv.org Artificial IntelligenceMay-17-2023

The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation. Neural interlingua representations have been shown as an effective method for achieving this. However, fixed-length neural interlingua representations introduced in previous work can limit its flexibility and representation ability. In this study, we introduce a novel method to enhance neural interlingua representations by making their length variable, thereby overcoming the constraint of fixed-length neural interlingua representations. Our empirical results on zero-shot translation on OPUS, IWSLT, and Europarl datasets demonstrate stable model convergence and superior zero-shot translation results compared to fixed-length neural interlingua representations. However, our analysis reveals the suboptimal efficacy of our approach in translating from certain source languages, wherein we pinpoint the defective model component in our proposed method.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2305.1019

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

Mao, Zhuoyuan, Dabre, Raj, Liu, Qianying, Song, Haiyue, Chu, Chenhui, Kurohashi, Sadao

arXiv.org Artificial IntelligenceMay-16-2023

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may overfit supervised directions and thus have low generalizability for ZST. Through experiments on OPUS, IWSLT, and Europarl datasets for 54 ZST directions, we demonstrate that the original Transformer setting of LayerNorm after residual connections (PostNorm) consistently outperforms PreNorm by up to 12.3 BLEU points. We then study the performance disparities by analyzing the differences in off-target rates and structural variations between PreNorm and PostNorm. This study highlights the need for careful consideration of the LayerNorm setting for ZST.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.09312

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision

Wan, Zhen, Cheng, Fei, Liu, Qianying, Mao, Zhuoyuan, Song, Haiyue, Kurohashi, Sadao

arXiv.org Artificial IntelligenceFeb-10-2023

Contrastive pre-training on distant supervision has shown remarkable effectiveness in improving supervised relation extraction tasks. However, the existing methods ignore the intrinsic noise of distant supervision during the pre-training stage. In this paper, we propose a weighted contrastive learning method by leveraging the supervised data to estimate the reliability of pre-training instances and explicitly reduce the effect of noise. Experimental results on three supervised datasets demonstrate the advantages of our proposed weighted contrastive learning approach compared to two state-of-the-art non-weighted baselines.Our code and models are available at: https://github.com/YukinoWan/WCL

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2205.0877

Country:

Europe (0.94)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback