AITopics | Wan, Xiaojun

Collaborating Authors

Wan, Xiaojun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CFunModel: A "Funny" Language Model Capable of Chinese Humor Generation and Processing

Yu, Zhenghan, Hu, Xinyu, Wan, Xiaojun

arXiv.org Artificial IntelligenceMar-26-2025

Humor plays a significant role in daily language communication. With the rapid development of large language models (LLMs), natural language processing has made significant strides in understanding and generating various genres of texts. However, most LLMs exhibit poor performance in generating and processing Chinese humor. In this study, we introduce a comprehensive Chinese humor-related dataset, the Chinese Fun Set (CFunSet). This dataset aggregates existing Chinese humor datasets and includes over 20,000 jokes collected from Tieba-JokeBar, a Chinese online platform known for joke sharing. The resulting corpus comprises more than 160,000 entries. Leveraging CFunSet, we developed the Chinese Fun Model (CFunModel), the first large language model designed to handle various Chinese humor-related tasks including Crosstalk Response Selection, Humor Recognition, Joke Generation, etc. Experimental results demonstrate that CFunModel outperforms popular large language models in these tasks. Our CFunSet is available at https://huggingface.co/datasets/ZhenghanYU/CFunSet and CFunModel is available at https://huggingface.co/ZhenghanYU/CFunModel. A demostration video of our work is available at https://youtu.be/MOsISOJ66Ms.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.20417

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators

Chang, Jiayi, Gao, Mingqi, Hu, Xinyu, Wan, Xiaojun

arXiv.org Artificial IntelligenceMar-6-2025

Previous research has shown that LLMs have potential in multilingual NLG evaluation tasks. However, existing research has not fully explored the differences in the evaluation capabilities of LLMs across different languages. To this end, this study provides a comprehensive analysis of the multilingual evaluation performance of 10 recent LLMs, spanning high-resource and low-resource languages through correlation analysis, perturbation attacks, and fine-tuning. We found that 1) excluding the reference answer from the prompt and using large-parameter LLM-based evaluators leads to better performance across various languages; 2) most LLM-based evaluators show a higher correlation with human judgments in high-resource languages than in low-resource languages; 3) in the languages where they are most sensitive to such attacks, they also tend to exhibit the highest correlation with human judgments; and 4) fine-tuning with data from a particular language yields a broadly consistent enhancement in the model's evaluation performance across diverse languages. Our findings highlight the imbalance in LLMs'evaluation capabilities across different languages and suggest that low-resource language scenarios deserve more attention.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.0436

Country: Asia (0.48)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry: Government > Regional Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models

Jia, Boyu, Zhang, Junzhe, Zhang, Huixuan, Wan, Xiaojun

arXiv.org Artificial IntelligenceMar-3-2025

In recent years, multimodal large language models (MLLMs) have achieved significant breakthroughs, enhancing understanding across text and vision. However, current MLLMs still face challenges in effectively integrating knowledge across these modalities during multimodal knowledge reasoning, leading to inconsistencies in reasoning outcomes. To systematically explore this issue, we propose four evaluation tasks and construct a new dataset. We conduct a series of experiments on this dataset to analyze and compare the extent of consistency degradation in multimodal knowledge reasoning within MLLMs. Based on the experimental results, we identify factors contributing to the observed degradation in consistency. Our research provides new insights into the challenges of multimodal knowledge reasoning and offers valuable guidance for future efforts aimed at improving MLLMs.

consistency, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.04801

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Basketball (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection

Li, Jiatao, Wan, Xiaojun

arXiv.org Artificial IntelligenceFeb-18-2025

The rise of Large Language Models (LLMs) necessitates accurate AI-generated text detection. However, current approaches largely overlook the influence of author characteristics. We investigate how sociolinguistic attributes-gender, CEFR proficiency, academic field, and language environment-impact state-of-the-art AI text detectors. Using the ICNALE corpus of human-authored texts and parallel AI-generated texts from diverse LLMs, we conduct a rigorous evaluation employing multi-factor ANOVA and weighted least squares (WLS). Our results reveal significant biases: CEFR proficiency and language environment consistently affected detector accuracy, while gender and academic field showed detector-dependent effects. These findings highlight the crucial need for socially aware AI text detection to avoid unfairly penalizing specific demographic groups. We offer novel empirical evidence, a robust statistical framework, and actionable insights for developing more equitable and reliable detection systems in real-world, out-of-domain contexts. This work paves the way for future research on bias mitigation, inclusive evaluation benchmarks, and socially responsible LLM detectors.

large language model, machine learning, qwen2, (18 more...)

arXiv.org Artificial Intelligence

2502.12611

Country:

Asia (1.00)
North America > United States (0.45)
Europe > United Kingdom (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Aspect-Guided Multi-Level Perturbation Analysis of Large Language Models in Automated Peer Review

Li, Jiatao, Li, Yanheng, Hu, Xinyu, Gao, Mingqi, Wan, Xiaojun

arXiv.org Artificial IntelligenceFeb-17-2025

We propose an aspect-guided, multi-level perturbation framework to evaluate the robustness of Large Language Models (LLMs) in automated peer review. Our framework explores perturbations in three key components of the peer review process-papers, reviews, and rebuttals-across several quality aspects, including contribution, soundness, presentation, tone, and completeness. By applying targeted perturbations and examining their effects on both LLM-as-Reviewer and LLM-as-Meta-Reviewer, we investigate how aspect-based manipulations, such as omitting methodological details from papers or altering reviewer conclusions, can introduce significant biases in the review process. We identify several potential vulnerabilities: review conclusions that recommend a strong reject may significantly influence meta-reviews, negative or misleading reviews may be wrongly interpreted as thorough, and incomplete or hostile rebuttals can unexpectedly lead to higher acceptance rates. Statistical tests show that these biases persist under various Chain-of-Thought prompting strategies, highlighting the lack of robust critical evaluation in current LLMs. Our framework offers a practical methodology for diagnosing these vulnerabilities, thereby contributing to the development of more reliable and robust automated reviewing systems.

artificial intelligence, aspect-guided multi-level perturbation analysis, large language model, (2 more...)

arXiv.org Artificial Intelligence

2502.1251

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Hu, Xinyu, Gao, Mingqi, Lin, Li, Yu, Zhenghan, Wan, Xiaojun

arXiv.org Artificial IntelligenceFeb-17-2025

In NLG meta-evaluation, evaluation metrics are typically assessed based on their consistency with humans. However, we identify some limitations in traditional NLG meta-evaluation approaches, such as issues in handling human ratings and ambiguous selections of correlation measures, which undermine the effectiveness of meta-evaluation. In this work, we propose a dual-perspective NLG meta-evaluation framework that focuses on different evaluation capabilities, thereby providing better interpretability. In addition, we introduce a method of automatically constructing the corresponding benchmarks without requiring new human annotations. Furthermore, we conduct experiments with 16 representative LLMs as the evaluators based on our proposed framework, comprehensively analyzing their evaluation performance from different perspectives.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.12052

Country:

Asia (0.68)
Europe > Austria (0.28)
North America > Mexico (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference

Gao, Mingqi, Liu, Yixin, Hu, Xinyu, Wan, Xiaojun, Bragg, Jonathan, Cohan, Arman

arXiv.org Artificial IntelligenceDec-31-2024

Evaluating and ranking the capabilities of different LLMs is crucial for understanding their performance and alignment with human preferences. Due to the high cost and time-consuming nature of human evaluations, an automatic LLM bencher (i.e., an automatic evaluation framework that aims to rank LLMs based on their alignment with human preferences) is indispensable. An automatic LLM bencher consists of four components: the input set (e.g., a user instruction), the evaluation model (e.g., an LLM), the evaluation type (e.g., pairwise comparison), and the aggregation method (e.g., the ELO rating system). However, previous work has not thoroughly explored how to select these components or how their different combinations influence the results. In this work, through controlled experiments, we provide a series of recommendations on how to choose each component to better automate the evaluation of LLMs. Furthermore, we discovered that when evaluating LLMs with similar performance, the performance of the automatic LLM bencher declines sharply, underscoring the limitations of current benchers and calling for future work. Lastly, we found that the evaluation models' performance at the instance level (e.g., the accuracy of selecting the best output) does not always align with their effectiveness when used as a component of a bencher, highlighting the importance of dedicated system-level evaluation of benchers.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.0056

Country:

North America > United States (0.46)
Europe > Austria > Vienna (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry:

Energy (0.67)
Leisure & Entertainment > Games (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models

Xie, Jinxiang, Li, Yilin, Yin, Xunjian, Wan, Xiaojun

arXiv.org Artificial IntelligenceDec-17-2024

Evaluating the performance of Grammatical Error Correction (GEC) models has become increasingly challenging, as large language model (LLM)-based GEC systems often produce corrections that diverge from provided gold references. This discrepancy undermines the reliability of traditional reference-based evaluation metrics. In this study, we propose a novel evaluation framework for GEC models, DSGram, integrating Semantic Coherence, Edit Level, and Fluency, and utilizing a dynamic weighting mechanism. Our framework employs the Analytic Hierarchy Process (AHP) in conjunction with large language models to ascertain the relative importance of various evaluation criteria. Additionally, we develop a dataset incorporating human annotations and LLM-simulated sentences to validate our algorithms and fine-tune more cost-effective models. Experimental results indicate that our proposed approach enhances the effectiveness of GEC model evaluations.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.12832

Country:

North America > United States (0.46)
Asia > China (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

Li, Jiatao, Hu, Xinyu, Yin, Xunjian, Wan, Xiaojun

arXiv.org Artificial IntelligenceDec-14-2024

The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.13192

Country:

Europe (1.00)
Asia > Russia (1.00)
North America > United States > Kentucky (0.16)

Genre: Research Report > New Finding (0.87)

Industry:

Transportation > Air (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

$B^4$: A Black-Box Scrubbing Attack on LLM Watermarks

Huang, Baizhou, Pu, Xiao, Wan, Xiaojun

arXiv.org Artificial IntelligenceNov-6-2024

Watermarking has emerged as a prominent technique for LLM-generated content detection by embedding imperceptible patterns. Despite supreme performance, its robustness against adversarial attacks remains underexplored. Previous work typically considers a grey-box attack setting, where the specific type of watermark is already known. Some even necessitates knowledge about hyperparameters of the watermarking method. Such prerequisites are unattainable in real-world scenarios. Targeting at a more realistic black-box threat model with fewer assumptions, we here propose $B^4$, a black-box scrubbing attack on watermarks. Specifically, we formulate the watermark scrubbing attack as a constrained optimization problem by capturing its objectives with two distributions, a Watermark Distribution and a Fidelity Distribution. This optimization problem can be approximately solved using two proxy distributions. Experimental results across 12 different settings demonstrate the superior performance of $B^4$ compared with other baselines.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.01222

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.93)
Transportation > Air (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback