AITopics | Yang, Muyun

Collaborating Authors

Yang, Muyun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mitigating the Bias of Large Language Model Evaluation

Zhou, Hongli, Huang, Hui, Long, Yunfei, Xu, Bing, Zhu, Conghui, Cao, Hailong, Yang, Muyun, Zhao, Tiejun

arXiv.org Artificial IntelligenceSep-25-2024

Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in the flavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output quality. However, existing judges are proven to be biased, namely they would favor answers which present better superficial quality (such as verbosity, fluency) while ignoring the instruction following ability. In this work, we propose systematic research about the bias of LLM-as-a-Judge. Specifically, for closed-source judge models, we apply calibration to mitigate the significance of superficial quality, both on probability level and prompt level. For open-source judge models, we propose to mitigate the bias by contrastive training, with curated negative samples that deviate from instruction but present better superficial quality. We apply our methods on the bias evaluation benchmark, and experiment results show our methods mitigate the bias by a large margin while maintaining a satisfactory evaluation accuracy.

large language model, machine learning, superficial quality, (19 more...)

arXiv.org Artificial Intelligence

2409.16788

Country: Asia > China (0.29)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

Chen, Andong, Lou, Lianzhang, Chen, Kehai, Bai, Xuefeng, Xiang, Yang, Yang, Muyun, Zhao, Tiejun, Zhang, Min

arXiv.org Artificial IntelligenceJun-21-2024

Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback, thereby enhancing the models' self-reflective abilities and improving translation performance. The application of this method across various translation tasks has proven its effectiveness in improving translation accuracy and eliminating ambiguities, especially in translation tasks with low-resource language pairs.

large language model, machine learning, translation, (17 more...)

arXiv.org Artificial Intelligence

2406.07232

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

On the Limitations of Fine-tuned Judge Models for LLM Evaluation

Huang, Hui, Qu, Yingqi, Zhou, Hongli, Liu, Jing, Yang, Muyun, Xu, Bing, Zhao, Tiejun

arXiv.org Artificial IntelligenceJun-17-2024

Recently, there has been a growing trend of utilizing Large Language Model (LLM) to evaluate the quality of other LLMs. Many studies have employed proprietary close-source models, especially GPT-4, as the evaluator. Alternatively, other works have fine-tuned judge models based on open-source LLMs as the evaluator. While the fine-tuned judge models are claimed to achieve comparable evaluation capability with GPT-4, in this study, we conduct an empirical study of judge models. Our findings indicate that although the fine-tuned judge models achieve high performance on in-domain test sets, even surpassing GPT-4, they underperform GPT-4 across several dimensions, including generalizability, fairness, aspect-specific evaluation, and scalability. We also reveal that the fine-tuned judge model inherently operates as a task-specific classifier, consequently imposing the limitations. Finally, we propose an effective indicator to measure the reliability of fine-tuned judges, with the aim of maximizing their utility in LLM evaluation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.02839

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Self-Evaluation of Large Language Model based on Glass-box Features

Huang, Hui, Qu, Yingqi, Liu, Jing, Yang, Muyun, Zhao, Tiejun

arXiv.org Artificial IntelligenceMar-6-2024

The proliferation of open-source Large Language Models (LLMs) underscores the pressing need for evaluation methods. Existing works primarily rely on external evaluators, focusing on training and prompting strategies. However, a crucial aspect - model-aware glass-box features - is overlooked. In this study, we explore the utility of glass-box features under the scenario of self-evaluation, namely applying an LLM to evaluate its own output. We investigate various glass-box feature groups and discovered that the softmax distribution serves as a reliable indicator for quality evaluation. Furthermore, we propose two strategies to enhance the evaluation by incorporating features derived from references. Experimental results on public benchmarks validate the feasibility of self-evaluation of LLMs using glass-box features.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.04222

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models

Liang, Xinnian, Zhou, Zefan, Huang, Hui, Wu, Shuangzhi, Xiao, Tong, Yang, Muyun, Li, Zhoujun, Bian, Chao

arXiv.org Artificial IntelligenceMar-21-2023

Pretrained language models (PLMs) have shown marvelous improvements across various NLP tasks. Most Chinese PLMs simply treat an input text as a sequence of characters, and completely ignore word information. Although Whole Word Masking can alleviate this, the semantics in words is still not well represented. In this paper, we revisit the segmentation granularity of Chinese PLMs. We propose a mixed-granularity Chinese BERT (MigBERT) by considering both characters and words. To achieve this, we design objective functions for learning both character and word-level representations. We conduct extensive experiments on various Chinese NLP tasks to evaluate existing PLMs as well as the proposed MigBERT. Experimental results show that MigBERT achieves new SOTA performance on all these tasks. Further analysis demonstrates that words are semantically richer than characters. More interestingly, we show that MigBERT also works with Japanese. Our code and model have been released here~\footnote{https://github.com/xnliang98/MigBERT}.

artificial intelligence, computational linguistic, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.10893

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Translation Prediction with Source Dependency-Based Context Representation

Chen, Kehai (Harbin Institute of Technology) | Zhao, Tiejun ( Harbin Institute of Technology ) | Yang, Muyun ( Harbin Institute of Technology ) | Liu, Lemao (National Institute of Information and Communications Technology)

AAAI ConferencesFeb-14-2017

Learning context representations is very promising to improve translation results, particularly through neural networks. Previous efforts process the context words sequentially and neglect their internal syntactic structure. In this paper, we propose a novel neural network based on bi-convolutional architecture to represent the source dependency-based context for translation prediction. The proposed model is able to not only encode the long-distance dependencies but also capture the functional similarities for better translation prediction (i.e., ambiguous words translation and word forms translation). Examined by a large-scale Chinese-English translation task, the proposed approach achieves a significant improvement (of up to +1.9 BLEU points) over the baseline system, and meanwhile outperforms a number of context-enhanced comparison system.

machine translation, neural network, proceedings, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East (0.15)
Asia > China (0.15)
Asia > Japan (0.14)

Genre: Research Report (0.95)

Industry: Government (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback