AITopics | Wang, Sirui

Collaborating Authors

Wang, Sirui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition

Huang, Jingwang, Zhong, Jiang, Lei, Qin, Gao, Jinpeng, Yang, Yuming, Wang, Sirui, Li, Peiguang, Wei, Kaiwen

arXiv.org Artificial IntelligenceFeb-19-2025

Multimodal multi-label emotion recognition (MMER) aims to identify the concurrent presence of multiple emotions in multimodal data. Existing studies primarily focus on improving fusion strategies and modeling modality-to-label dependencies. However, they often overlook the impact of \textbf{aleatoric uncertainty}, which is the inherent noise in the multimodal data and hinders the effectiveness of modality fusion by introducing ambiguity into feature representations. To address this issue and effectively model aleatoric uncertainty, this paper proposes Latent emotional Distribution Decomposition with Uncertainty perception (LDDU) framework from a novel perspective of latent emotional space probabilistic modeling. Specifically, we introduce a contrastive disentangled distribution mechanism within the emotion space to model the multimodal data, allowing for the extraction of semantic features and uncertainty. Furthermore, we design an uncertainty-aware fusion multimodal method that accounts for the dispersed distribution of uncertainty and integrates distribution information. Experimental results show that LDDU achieves state-of-the-art performance on the CMU-MOSEI and M$^3$ED datasets, highlighting the importance of uncertainty modeling in MMER. Code is available at https://github.com/201983290498/lddu\_mmer.git.

emotion recognition, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.13954

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

FRAME: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy

Zhang, Xuemiao, Duan, Feiyu, Xu, Liangyu, Zhou, Yongwei, Wang, Sirui, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang

arXiv.org Artificial IntelligenceFeb-18-2025

Large language models (LLMs) have significantly advanced human language understanding and generation, with pretraining data quality and organization being crucial to their performance. Multi-stage pretraining is a promising approach, but existing methods often lack quantitative criteria for data partitioning and instead rely on intuitive heuristics. In this paper, we propose the novel Four-quadRAnt Multi-stage prEtraining strategy (FRAME), guided by the established principle of organizing the pretraining process into four stages to achieve significant loss reductions four times. This principle is grounded in two key findings: first, training on high Perplexity (PPL) data followed by low PPL data, and second, training on low PPL difference (PD) data followed by high PD data, both causing the loss to drop significantly twice and performance enhancements. By partitioning data into four quadrants and strategically organizing them, FRAME achieves a remarkable 16.8% average improvement over random across MMLU and CMMLU for the 3B model, effectively boosting LLM performance.

large language model, natural language, ppl, (18 more...)

arXiv.org Artificial Intelligence

2502.05551

Country:

Africa (0.68)
Asia > India (0.67)

Genre: Research Report > New Finding (0.67)

Industry:

Government (1.00)
Law (0.92)
Energy (0.84)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data

Zhang, Xuemiao, Xu, Liangyu, Duan, Feiyu, Zhou, Yongwei, Wang, Sirui, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang

arXiv.org Artificial IntelligenceFeb-17-2025

Large language models (LLMs) generally utilize a consistent data distribution throughout the pretraining process. However, as the model's capability improves, it is intuitive that its data preferences dynamically change, indicating the need for pretraining with different data at various training stages. To achieve it, we propose the Perplexity Difference (PD) based Preference Curriculum learning (PDPC) framework, which always perceives and uses the data preferred by LLMs to train and boost them. First, we introduce the PD metric to quantify the difference in how challenging a sample is for weak versus strong models. Samples with high PD are more challenging for weak models to learn and are more suitable to be arranged in the later stage of pretraining. Second, we propose the preference function to approximate and predict the data preference of the LLM at any training step, so as to complete the arrangement of the dataset offline and ensure continuous training without interruption. Experimental results on 1.3B and 3B models demonstrate that PDPC significantly surpasses baselines. Notably, the 3B model trained on 1T tokens achieves an increased average accuracy of over 8.1% across MMLU and CMMLU.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.13126

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.68)
Energy (0.47)
Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training

Xu, Liangyu, Zhang, Xuemiao, Duan, Feiyu, Wang, Sirui, Wang, Jingang, Cai, Xunliang

arXiv.org Artificial IntelligenceFeb-17-2025

Selecting high-quality data can significantly improve the pretraining efficiency of large language models (LLMs). Existing methods generally rely on heuristic techniques and single-quality signals, limiting their ability to evaluate data quality comprehensively. In this work, we propose FIRE, a flexible and scalable framework for integrating multiple data quality raters, which allows for a comprehensive assessment of data quality across various dimensions. FIRE aligns multiple quality signals into a unified space, and integrates diverse data quality raters to provide a comprehensive quality signal for each data point. Further, we introduce a progressive data selection scheme based on FIRE that iteratively refines the selection of high-quality data points. Experiments on the SlimPajama dataset reveal that FIRE outperforms other data selection methods and significantly enhances the pretrained model across a wide range of downstream tasks, with a 2.9% average performance improvement over Random and reducing the FLOPs necessary to achieve a certain performance level by more than half.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.00761

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.67)

Industry: Energy > Oil & Gas (0.93)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

Pan, Ruotong, Cao, Boxi, Lin, Hongyu, Han, Xianpei, Zheng, Jia, Wang, Sirui, Cai, Xunliang, Sun, Le

arXiv.org Artificial IntelligenceMay-8-2024

The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and correctness of the generated outcomes. In this paper, we propose Credibility-aware Generation (CAG), a universally applicable framework designed to mitigate the impact of flawed information in RAG. At its core, CAG aims to equip models with the ability to discern and process information based on its credibility. To this end, we propose an innovative data transformation framework that generates data based on credibility, thereby effectively endowing models with the capability of CAG. Furthermore, to accurately evaluate the models' capabilities of CAG, we construct a comprehensive benchmark covering three critical real-world scenarios. Experimental results demonstrate that our model can effectively understand and utilize credibility for generation, significantly outperform other models with retrieval augmentation, and exhibit resilience against the disruption caused by noisy documents, thereby maintaining robust performance. Moreover, our model supports customized credibility, offering a wide range of potential applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.06809

Country:

Europe (1.00)
Asia > Middle East (0.68)
North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Voting & Elections (0.46)
Government > Regional Government (0.46)
Media > News (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation

Wang, Keheng, Duan, Feiyu, Li, Peiguang, Wang, Sirui, Cai, Xunliang

arXiv.org Artificial IntelligenceApr-22-2024

Retrieval-Augmented Generation (RAG) demonstrates great value in alleviating outdated knowledge or hallucination by supplying LLMs with updated and relevant knowledge. However, there are still several difficulties for RAG in understanding complex multi-hop query and retrieving relevant documents, which require LLMs to perform reasoning and retrieve step by step. Inspired by human's reasoning process in which they gradually search for the required information, it is natural to ask whether the LLMs could notice the missing information in each reasoning step. In this work, we first experimentally verified the ability of LLMs to extract information as well as to know the missing. Based on the above discovery, we propose a Missing Information Guided Retrieve-Extraction-Solving paradigm (MIGRES), where we leverage the identification of missing information to generate a targeted query that steers the subsequent knowledge retrieval. Besides, we design a sentence-level re-ranking filtering approach to filter the irrelevant content out from document, along with the information extraction capability of LLMs to extract useful information from cleaned-up documents, which in turn to bolster the overall efficacy of RAG. Extensive experiments conducted on multiple public datasets reveal the superiority of the proposed MIGRES method, and analytical experiments demonstrate the effectiveness of our proposed modules.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.14043

Country:

Europe (1.00)
Asia (0.93)
North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework

Lei, Shanglin, Dong, Guanting, Wang, Xiaoping, Wang, Keheng, Wang, Sirui

arXiv.org Artificial IntelligenceNov-24-2023

The development of emotion recognition in dialogue (ERC) has been consistently hindered by the complexity of pipeline designs, leading to ERC models that often overfit to specific datasets and dialogue patterns. In this study, we propose a novel approach, namely InstructERC, to reformulates the ERC task from a discriminative framework to a generative framework based on Large Language Models (LLMs) . InstructERC has two significant contributions: Firstly, InstructERC introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information by concatenating the historical dialog content, label statement, and emotional domain demonstrations with high semantic similarity. Furthermore, we introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations. Our LLM-based plug-and-play plugin framework significantly outperforms all previous models and achieves comprehensive SOTA on three commonly used ERC datasets. Extensive analysis of parameter-efficient and data-scaling experiments provide empirical guidance for applying InstructERC in practical scenarios. Our code will be released after blind review.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2309.11911

Country:

Asia > China (0.47)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Wang, Keheng, Duan, Feiyu, Wang, Sirui, Li, Peiguang, Xian, Yunsen, Yin, Chuantao, Rong, Wenge, Xiong, Zhang

arXiv.org Artificial IntelligenceOct-28-2023

Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come with incorrect or unfaithful intermediate reasoning steps, especially in the context of answering knowledge-intensive tasks such as KBQA. To alleviate this issue, we propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge, and thus overcome the hallucinations and error propagation. Concretely, we formulate the CoT rationale process of LLMs into a structured multi-round QA format. In each round, LLMs interact with a QA system that retrieves external knowledge and produce faithful reasoning traces based on retrieved precise answers. The structured CoT reasoning of LLMs is facilitated by our developed KBQA CoT collection, which serves as in-context learning demonstrations and can also be utilized as feedback augmentation to train a robust retriever. Extensive experiments on WebQSP and ComplexWebQuestion datasets demonstrate the effectiveness of proposed KD-CoT in task-solving reasoning generation, which outperforms the vanilla CoT ICL with an absolute success rate of 8.0% and 5.1%. Furthermore, our proposed feedback-augmented retriever outperforms the state-of-the-art baselines for retrieving knowledge, achieving significant improvement in Hit and recall performance. Our code and data are released on https://github.com/AdelWang/KD-CoT/tree/main.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.13259

Country:

Asia > Middle East > Republic of Türkiye (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion

Song, Jian, Liang, Di, Li, Rumei, Li, Yuntao, Wang, Sirui, Peng, Minlong, Wu, Wei, Yu, Yongxin

arXiv.org Artificial IntelligenceAug-24-2023

Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the \textbf{D}ependency-Enhanced \textbf{A}daptive \textbf{F}usion \textbf{A}ttention (\textbf{DAFA}), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, \textbf{\emph{(i)}} DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.08471

Country: Asia > China (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

T5-SR: A Unified Seq-to-Seq Decoding Strategy for Semantic Parsing

Li, Yuntao, Su, Zhenpeng, Li, Yutian, Zhang, Hanchu, Wang, Sirui, Wu, Wei, Zhang, Yan

arXiv.org Artificial IntelligenceJun-14-2023

However, Translating natural language queries into SQLs in a seq2seq to produce a correct SQL expression, a parser should not manner has attracted much attention recently. However, only understand the semantics of the input query but also produce compared with abstract-syntactic-tree-based SQL generation, predictions that satisfy the SQL grammar and database seq2seq semantic parsers face much more challenges, including schema restrictions. We experimentally find that with the help poor quality on schematical information prediction and of pre-trained language models, seq2seq models are capable poor semantic coherence between natural language queries of generating legal SQL skeletons, while detailed schematic and SQLs. This paper analyses the above difficulties and information prediction remains a big difficulty for seq2seq proposes a seq2seq-oriented decoding strategy called SR, parsers. To solve this problem, in this paper, we propose which includes a new intermediate representation SSQL and a new intermediate representation called SSQL (Semantic-a reranking method with score re-estimator to solve the above SQL) for seq2seq SQL generation based on standard SQL obstacles respectively.

artificial intelligence, natural language, parser, (17 more...)

arXiv.org Artificial Intelligence

2306.08368

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback