AITopics | Li, Yizhi

Collaborating Authors

Li, Yizhi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Length is a Curse and a Blessing for Document-level Semantics

Xiao, Chenghao, Li, Yizhi, Hudson, G Thomas, Lin, Chenghua, Moubayed, Noura Al

arXiv.org Artificial IntelligenceOct-24-2023

In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.

document-level semantic, information retrieval, natural language, (3 more...)

arXiv.org Artificial Intelligence

2310.16193

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

Audio Contrastive based Fine-tuning

Wang, Yang, Liang, Qibin, Xiao, Chenghao, Li, Yizhi, Moubayed, Noura Al, Lin, Chenghua

arXiv.org Artificial IntelligenceOct-19-2023

Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuning (AudioConFit), an efficient approach characterised by robust generalisability. Empirical experiments on a variety of audio classification tasks demonstrate the effectiveness and robustness of our approach, which achieves state-of-the-art results in various settings.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2309.11895

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

On the Effectiveness of Speech Self-supervised Learning for Music

Ma, Yinghao, Yuan, Ruibin, Li, Yizhi, Zhang, Ge, Chen, Xingran, Yin, Hanzhi, Lin, Chenghua, Benetos, Emmanouil, Ragni, Anton, Gyenge, Norbert, Liu, Ruibo, Xia, Gus, Dannenberg, Roger, Guo, Yike, Fu, Jie

arXiv.org Artificial IntelligenceJul-11-2023

Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.05161

Country:

Europe (1.00)
Asia > China (0.46)
North America > United States > Michigan (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Li, Yizhi, Yuan, Ruibin, Zhang, Ge, Ma, Yinghao, Chen, Xingran, Yin, Hanzhi, Lin, Chenghua, Ragni, Anton, Benetos, Emmanouil, Gyenge, Norbert, Dannenberg, Roger, Liu, Ruibo, Chen, Wenhu, Xia, Gus, Shi, Yemin, Huang, Wenhao, Guo, Yike, Fu, Jie

arXiv.org Artificial IntelligenceJun-6-2023

Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT.

artificial intelligence, machine learning, representation, (12 more...)

arXiv.org Artificial Intelligence

2306.00107

Country:

Europe (1.00)
North America > United States > Michigan (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Interactive Natural Language Processing

Wang, Zekun, Zhang, Ge, Yang, Kexin, Shi, Ning, Zhou, Wangchunshu, Hao, Shaochun, Xiong, Guangzheng, Li, Yizhi, Sim, Mong Yuan, Chen, Xiuying, Zhu, Qingqing, Yang, Zhenzhu, Nik, Adam, Liu, Qi, Lin, Chenghua, Wang, Shi, Liu, Ruibo, Chen, Wenhu, Xu, Ke, Liu, Dayiheng, Guo, Yike, Fu, Jie

arXiv.org Artificial IntelligenceMay-22-2023

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence. This paradigm considers language models as agents capable of observing, acting, and receiving feedback iteratively from external entities. Specifically, language models in this context can: (1) interact with humans for better understanding and addressing user needs, personalizing responses, aligning with human values, and improving the overall user experience; (2) interact with knowledge bases for enriching language representations with factual knowledge, enhancing the contextual relevance of responses, and dynamically leveraging external information to generate more accurate and informed responses; (3) interact with models and tools for effectively decomposing and addressing complex tasks, leveraging specialized expertise for specific subtasks, and fostering the simulation of social behaviors; and (4) interact with environments for learning grounded representations of language, and effectively tackling embodied tasks such as reasoning, planning, and decision-making in response to environmental observations. This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept. We then provide a systematic classification of iNLP, dissecting its various components, including interactive objects, interaction interfaces, and interaction methods. We proceed to delve into the evaluation methodologies used in the field, explore its diverse applications, scrutinize its ethical and safety issues, and discuss prospective research directions. This survey serves as an entry point for researchers who are interested in this rapidly evolving area and offers a broad view of the current landscape and future trajectory of iNLP.

machine learning, natural language, reinforcement learning, (25 more...)

arXiv.org Artificial Intelligence

2305.13246

Country:

Asia (1.00)
North America > Canada (0.92)
Europe > United Kingdom (0.67)
(2 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > Promising Solution (0.45)

Industry:

Media (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Education > Curriculum (0.92)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(8 more...)

Add feedback

Chinese Open Instruction Generalist: A Preliminary Release

Zhang, Ge, Shi, Yemin, Liu, Ruibo, Yuan, Ruibin, Li, Yizhi, Dong, Siwei, Shu, Yu, Li, Zhaoqun, Wang, Zekun, Lin, Chenghua, Huang, Wenhao, Fu, Jie

arXiv.org Artificial IntelligenceApr-24-2023

Pre-trained large-scale language models (LLMs) have shown revolutionary performance in many downstream tasks (Guo et al., 2023; Wei et al., 2021). One crucial ability of LLMs is called instruction following. That is, models can complete the tasks described by instructions given as input. This ability is based on a specialized training stage called instruction tuning. Compared to unlabeled data used for pre-training, the data for instruction tuning is typically more goal-oriented, and it should explicitly demonstrate how a response follows its corresponding instruction with a given input. There are many instruction tuning datasets in English. For example, the FLAN collection (Longpre et al., 2023) contains 15M examples covering 1836 tasks, and OPT-IML (Iyer et al., 2022b) claims to have 18M examples for more than 2000 tasks (although it is still not publicly available). In contrast, existing data resources for Chinese instruction tuning are either small in scale or have questionable quality. For example, Ziang Leng and Li (2023) directly translate English instruction tuning data into Chinese, but do not consider mitigating translation errors or potential cultural gaps, e.g.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.07987

Country: Asia > China (0.47)

Genre: Research Report (0.82)

Industry: Education > Assessment & Standards (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

Zhang, Ge, Li, Yizhi, Wu, Yaoyao, Zhang, Linyuan, Lin, Chenghua, Geng, Jiayi, Wang, Shi, Fu, Jie

arXiv.org Artificial IntelligenceJan-1-2023

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

artificial intelligence, chinese corpus, natural language, (3 more...)

arXiv.org Artificial Intelligence

2301.00395

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

Li, Yizhi, Yuan, Ruibin, Zhang, Ge, Ma, Yinghao, Lin, Chenghua, Chen, Xingran, Ragni, Anton, Yin, Hanzhi, Hu, Zhijie, He, Haoyu, Benetos, Emmanouil, Gyenge, Norbert, Liu, Ruibo, Fu, Jie

arXiv.org Artificial IntelligenceDec-5-2022

The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter. The model will be released on Huggingface(Please refer to: https://huggingface.co/m-a-p/music2vec-v1)

artificial intelligence, machine learning, representation, (10 more...)

arXiv.org Artificial Intelligence

2212.02508

Country:

Asia (0.48)
Europe (0.47)
North America > United States (0.30)

Genre: Research Report (0.50)

Industry:

Media > Music (0.71)
Leisure & Entertainment (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

Li, Yizhi, Zhang, Ge, Yang, Bohao, Lin, Chenghua, Wang, Shi, Ragni, Anton, Fu, Jie

arXiv.org Artificial IntelligenceNov-5-2022

Fairness has become a trending topic in natural language processing (NLP), which addresses biases targeting certain social groups such as genders and religions. However, regional bias in language models (LMs), a long-standing global discrimination problem, still remains unexplored. This paper bridges the gap by analysing the regional bias learned by the pre-trained language models that are broadly used in NLP tasks. In addition to verifying the existence of regional bias in LMs, we find that the biases on regional groups can be strongly influenced by the geographical clustering of the groups. We accordingly propose a HiErarchical Regional Bias evaluation method (HERB) utilising the information from the sub-region clusters to quantify the bias in pre-trained LMs. Experiments show that our hierarchical metric can effectively evaluate the regional bias with respect to comprehensive topics and measure the potential regional bias that can be propagated to downstream tasks. Our codes are available at https://github.com/Bernard-Yang/HERB.

artificial intelligence, natural language, regional bias, (16 more...)

arXiv.org Artificial Intelligence

2211.02882

Country:

Europe (1.00)
Africa (0.93)
North America > Mexico (0.69)
(2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback