AITopics | Mao, Xian-Ling

Collaborating Authors

Mao, Xian-Ling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection

Lu, Yi-Fan, Mao, Xian-Ling, Lan, Tian, Zhang, Tong, Zhu, Yu-Shi, Huang, Heyan

arXiv.org Artificial IntelligenceMar-5-2025

Automatic evaluation for Open Domain Event Detection (ODED) is a highly challenging task, because ODED is characterized by a vast diversity of un-constrained output labels from various domains. Nearly all existing evaluation methods for ODED usually first construct evaluation benchmarks with limited labels and domain coverage, and then evaluate ODED methods using metrics based on token-level label matching rules. However, this kind of evaluation framework faces two issues: (1) The limited evaluation benchmarks lack representatives of the real world, making it difficult to accurately reflect the performance of various ODED methods in real-world scenarios; (2) Evaluation metrics based on token-level matching rules fail to capture semantic similarity between predictions and golden labels. To address these two problems above, we propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection (SEOE) by constructing a more representative evaluation benchmark and introducing a semantic evaluation metric. Specifically, our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains, with a cost-effective supplementary annotation strategy to ensure the benchmark's representativeness. The strategy also allows for the supplement of new event types and domains in the future. Then, the proposed SEOE leverages large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels to enhance the reliability of the evaluation. Extensive experiments validate the representatives of the benchmark and the reliability of the semantic evaluation metric. Existing ODED methods are thoroughly evaluated, and the error patterns of predictions are analyzed, revealing several insightful findings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.03303

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > Virginia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Distribution-Consistency-Guided Multi-modal Hashing

Liu, Jin-Yu, Mao, Xian-Ling, Che, Tian-Yi, Tu, Rong-Cheng

arXiv.org Artificial IntelligenceDec-19-2024

Multi-modal hashing methods have gained popularity due to their fast speed and low storage requirements. Among them, the supervised methods demonstrate better performance by utilizing labels as supervisory signals compared with unsupervised methods. Currently, for almost all supervised multi-modal hashing methods, there is a hidden assumption that training sets have no noisy labels. However, labels are often annotated incorrectly due to manual labeling in real-world scenarios, which will greatly harm the retrieval performance. To address this issue, we first discover a significant distribution consistency pattern through experiments, i.e., the 1-0 distribution of the presence or absence of each category in the label is consistent with the high-low distribution of similarity scores of the hash codes relative to category centers. Then, inspired by this pattern, we propose a novel D istribution-Consistency-G uided Multi-modal Hashing ( DCGMH), which aims to filter and reconstruct noisy labels to enhance retrieval performance. Specifically, the proposed method first randomly initializes several category centers, each representing the region's centroid of its respective category, which are used to compute the high-low distribution of similarity scores; Noisy and clean labels are then separately filtered out via the discovered distribution consistency pattern to mitigate the impact of noisy labels; Subsequently, a correction strategy, which is indirectly designed via the distribution consistency pattern, is applied to the filtered noisy labels, correcting high-confidence ones while treating low-confidence ones as unlabeled for unsupervised learning, thereby further enhancing the model's performance. Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines in multi-modal retrieval tasks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.11216

Country:

Asia (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines

Ma, Zi-Ao, Lan, Tian, Tu, Rong-Cheng, Hu, Yong, Huang, Heyan, Mao, Xian-Ling

arXiv.org Artificial IntelligenceNov-25-2024

This paper investigates an intriguing task of Multi-modal Retrieval Augmented Multi-modal Generation (M$^2$RAG). This task requires foundation models to browse multi-modal web pages, with mixed text and images, and generate multi-modal responses for solving user queries, which exhibits better information density and readability. Given the early researching stage of M$^2$RAG task, there is a lack of systematic studies and analysis. To fill this gap, we construct a benchmark for M$^2$RAG task, equipped with a suite of text-modal metrics and multi-modal metrics to analyze the capabilities of existing foundation models. Besides, we also propose several effective methods for foundation models to accomplish this task, based on the comprehensive evaluation results on our benchmark. Extensive experimental results reveal several intriguing phenomena worth further research.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2411.16365

Country:

Asia > China (0.14)
Asia > Thailand (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.69)
Government (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark

Tu, Rong-Cheng, Ma, Zi-Ao, Lan, Tian, Zhao, Yuehao, Huang, Heyan, Mao, Xian-Ling

arXiv.org Artificial IntelligenceNov-23-2024

Driven by the remarkable progress in diffusion models, text-to-image generation has made significant strides, creating a pressing demand for automatic quality evaluation of generated images. Current state-of-the-art automatic evaluation methods heavily rely on Multi-modal Large Language Models (MLLMs), particularly powerful commercial models like GPT-4o. While these models are highly effective, their substantial costs limit scalability in large-scale evaluations. Adopting open-source MLLMs is an alternative; however, their performance falls short due to significant limitations in processing multi-modal data compared to commercial MLLMs. To tackle these problems, we first propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset, where the complex evaluation task is decoupled into simpler sub-tasks, effectively reducing the learning complexity. Based on this dataset, we design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Furthermore, to reliably and comprehensively assess prior works and our proposed model, we manually annotate a meta-evaluation benchmark that includes chain-of-thought explanations alongside quality scores for generated images. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline, VIEScore, with over 4.6\% improvement in Spearman and Kendall correlations with human judgments.

explanation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2411.15488

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training Language Models to Critique With Multi-agent Feedback

Lan, Tian, Zhang, Wenwei, Lyu, Chengqi, Li, Shuaibin, Xu, Chen, Huang, Heyan, Lin, Dahua, Mao, Xian-Ling, Chen, Kai

arXiv.org Artificial IntelligenceOct-20-2024

Critique ability, a meta-cognitive capability of humans, presents significant challenges for LLMs to improve. Recent works primarily rely on supervised fine-tuning (SFT) using critiques generated by a single LLM like GPT-4. However, these model-generated critiques often exhibit flaws due to the inherent complexity of the critique. Consequently, fine-tuning LLMs on such flawed critiques typically limits the model's performance and propagates these flaws into the learned model. To overcome these challenges, this paper proposes a novel data generation pipeline, named MultiCritique, that improves the critique ability of LLMs by utilizing multi-agent feedback in both the SFT and reinforcement learning (RL) stages. First, our data generation pipeline aggregates high-quality critiques from multiple agents instead of a single model, with crucial information as input for simplifying the critique. Furthermore, our pipeline improves the preference accuracy of critique quality through multi-agent feedback, facilitating the effectiveness of RL in improving the critique ability of LLMs. Based on our proposed MultiCritique data generation pipeline, we construct the MultiCritiqueDataset for the SFT and RL fine-tuning stages. Extensive experimental results on two benchmarks demonstrate: 1) the superior quality of our constructed SFT dataset compared to existing critique datasets; 2) additional improvements to the critique ability of LLMs brought by the RL stage. Notably, our fine-tuned 7B model significantly surpasses other advanced 7B-13B open-source models, approaching the performance of advanced 70B LLMs and GPT-4. Codes, datasets and model weights will be publicly available.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.15287

Country: Asia (0.93)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models

Lu, Yi-Fan, Mao, Xian-Ling, Lan, Tian, Xu, Chen, Huang, Heyan

arXiv.org Artificial IntelligenceOct-12-2024

Event extraction has gained extensive research attention due to its broad range of applications. However, the current mainstream evaluation method for event extraction relies on token-level exact match, which misjudges numerous semantic-level correct cases. This reliance leads to a significant discrepancy between the evaluated performance of models under exact match criteria and their real performance. To address this problem, we propose RAEE, an automatic evaluation framework that accurately assesses event extraction results at semantic-level instead of token-level. Specifically, RAEE leverages Large Language Models (LLMs) as automatic evaluation agents, incorporating chain-of-thought prompting and an adaptive mechanism to achieve interpretable and adaptive evaluations for precision and recall of triggers and arguments. Extensive experimental results demonstrate that: (1) RAEE achieves a very high correlation with the human average; (2) after reassessing 14 models, including advanced LLMs, on 10 datasets, there is a significant performance gap between exact match and RAEE. The exact match evaluation significantly underestimates the performance of existing event extraction models, particularly underestimating the capabilities of LLMs; (3) fine-grained analysis under RAEE evaluation reveals insightful phenomena worth further exploration. The evaluation toolkit of our proposed RAEE will be publicly released.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.09418

Country:

Asia (0.68)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.68)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

Fan, Shixuan, Wei, Wei, Li, Wendi, Mao, Xian-Ling, Xie, Wenfeng, Chen, Dangyang

arXiv.org Artificial IntelligenceJun-4-2024

The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to pay more attention to the nearby utterances instead of causally relevant ones, resulting in generating irrelevant and generic responses in long-term dialogue. To alleviate such problem, in this paper, we propose a novel method, named Causal Perception long-term Dialogue framework (CPD), which employs perturbation-based causal variable discovery method to extract casually relevant utterances from the dialogue history and enhances model causal perception during fine-tuning. Specifically, a local-position awareness method is proposed in CPD for inter-sentence position correlation elimination, which helps models extract causally relevant utterances based on perturbations. Then, a casual-perception fine-tuning strategy is also proposed, to enhance the capability of discovering the causal invariant factors, by differently perturbing causally relevant and non-casually relevant ones for response generation. Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines.

large language model, machine learning, utterance, (19 more...)

arXiv.org Artificial Intelligence

2406.02002

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Mix-Initiative Response Generation with Dynamic Prefix Tuning

Nie, Yuxiang, Huang, Heyan, Mao, Xian-Ling, Liao, Lizi

arXiv.org Artificial IntelligenceMar-27-2024

Mixed initiative serves as one of the key factors in controlling conversation directions. For a speaker, responding passively or leading proactively would result in rather different responses. However, most dialogue systems focus on training a holistic response generation model without any distinction among different initiatives. It leads to the cross-contamination problem, where the model confuses different initiatives and generates inappropriate responses. Moreover, obtaining plenty of human annotations for initiative labels can be expensive. To address this issue, we propose a general mix-Initiative Dynamic Prefix Tuning framework (IDPT) to decouple different initiatives from the generation model, which learns initiative-aware prefixes in both supervised and unsupervised settings. Specifically, IDPT decouples initiative factors into different prefix parameters and uses the attention mechanism to adjust the selection of initiatives in guiding generation dynamically. The prefix parameters can be tuned towards accurate initiative prediction as well as mix-initiative response generation. Extensive experiments on two public dialogue datasets show that the proposed IDPT outperforms previous baselines on both automatic metrics and human evaluations. It also manages to generate appropriate responses with manipulated initiatives.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.17636

Country:

North America > United States > California (0.14)
North America > Canada (0.14)
Asia > China (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

Zhuo, Le, Chi, Zewen, Xu, Minghao, Huang, Heyan, Zheng, Heqi, He, Conghui, Mao, Xian-Ling, Zhang, Wentao

arXiv.org Artificial IntelligenceFeb-27-2024

We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates. Additionally, we construct a large-scale interleaved protein-text dataset, named InterPT, for pre-training. This dataset comprehensively encompasses both (1) structured data sources like protein annotations and (2) unstructured data sources like biological research papers, thereby endowing ProtLLM with crucial knowledge for understanding proteins. We evaluate ProtLLM on classic supervised protein-centric tasks and explore its novel protein-language applications. Experimental results demonstrate that ProtLLM not only achieves superior performance against protein-specialized baselines on protein-centric tasks but also induces zero-shot and in-context learning capabilities on protein-language tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.0792

Country: North America (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

An Empirical Study on the Language Modal in Visual Question Answering

Peng, Daowan, Wei, Wei, Mao, Xian-Ling, Fu, Yuanyuan, Chen, Dangyang

arXiv.org Artificial IntelligenceSep-4-2023

Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.24963/ijcai.2023/457

2305.10143

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)

Add feedback