AITopics | Yu, Zhengtao

Collaborating Authors

Yu, Zhengtao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking

Liu, Yuxuan, Sun, Hongda, Guo, Wenya, Xiao, Xinyan, Mao, Cunli, Yu, Zhengtao, Yan, Rui

arXiv.org Artificial IntelligenceFeb-22-2025

Complex claim fact-checking performs a crucial role in disinformation detection. Moreover, evidence redundancy, where nonessential information complicates the verification process, remains a significant issue. To tackle these limitations, we propose Bilateral De fusing V erification ( BiDeV), a novel fact-checking working-flow framework integrating multiple role-played LLMs to mimic the human-expert fact-checking process. BiDeV consists of two main modules: V agueness Defusing identifies latent information and resolves complex relations to simplify the claim, and Redundancy Defusing eliminates redundant content to enhance the evidence quality. Extensive experimental results on two widely used challenging fact-checking benchmarks (Hover and Feverous-s) demonstrate that our BiDeV can achieve the best performance under both gold and open settings. This highlights the effectiveness of BiDeV in handling complex claims and ensuring precise fact-checking 1 . Introduction Fact-checking is crucial for claim verification by collecting relevant evidence and determining their veracity (Guo, Schlichtkrull, and Vlachos 2022).

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.16181

Country:

Asia (0.93)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Media > News (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models

Li, Jia-Nan, Guan, Jian, Wu, Wei, Yu, Zhengtao, Yan, Rui

arXiv.org Artificial IntelligenceOct-18-2024

Tables are ubiquitous across various domains for concisely representing structured information. Empowering large language models (LLMs) to reason over tabular data represents an actively explored direction. However, since typical LLMs only support one-dimensional~(1D) inputs, existing methods often flatten the two-dimensional~(2D) table structure into a sequence of tokens, which can severely disrupt the spatial relationships and result in an inevitable loss of vital contextual information. In this paper, we first empirically demonstrate the detrimental impact of such flattening operations on the performance of LLMs in capturing the spatial information of tables through two elaborate proxy tasks. Subsequently, we introduce a simple yet effective positional encoding method, termed ``2D-TPE'' (Two-Dimensional Table Positional Encoding), to address this challenge. 2D-TPE enables each attention head to dynamically select a permutation order of tokens within the context for attending to them, where each permutation represents a distinct traversal mode for the table, such as column-wise or row-wise traversal. 2D-TPE effectively mitigates the risk of losing essential spatial information while preserving computational efficiency, thus better preserving the table structure. Extensive experiments across five benchmarks demonstrate that 2D-TPE outperforms strong baselines, underscoring the importance of preserving the table structure for accurate table comprehension. Comprehensive analysis further reveals the substantially better scalability of 2D-TPE to large tables than baselines.

information, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2409.197

Country:

Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Mixed-Language Multi-Document News Summarization Dataset and a Graphs-Based Extract-Generate Model

Gao, Shengxiang, nan, Fang, Zhang, Yongbing, Huang, Yuxin, Tan, Kaiwen, Yu, Zhengtao

arXiv.org Artificial IntelligenceOct-13-2024

Existing research on news summarization primarily focuses on single-language singledocument (SLSD), single-language multidocument (SLMD) or cross-language singledocument (CLSD). However, in real-world scenarios, news about a international event often involves multiple documents in different languages, i.e., mixed-language multi-document (MLMD). Therefore, summarizing MLMD news is of great significance. However, the lack Figure 1: The diagram of SLSD, SLMD, CLSD and of datasets for MLMD news summarization has MLMD. Each rounded rectangle represents a source constrained the development of research in this document, while the pointed rectangle represents the area. To fill this gap, we construct a mixedlanguage target summary. "En" "De" "Fr" and "Es" indicate that multi-document news summarization the text is in English, German, French, and Spanish, dataset (MLMD-news), which contains four different respectively.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.09773

Country: Asia > China (0.29)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

Song, Ran, He, Shizhu, Gao, Shengxiang, Cai, Li, Liu, Kang, Yu, Zhengtao, Zhao, Jun

arXiv.org Artificial IntelligenceJun-26-2024

Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities, while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC.

artificial intelligence, knowledge, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.18085

Country:

Europe (1.00)
South America > Brazil (0.68)
Asia > China (0.47)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses

Li, Jia-Nan, Tu, Quan, Mao, Cunli, Yu, Zhengtao, Wen, Ji-Rong, Yan, Rui

arXiv.org Artificial IntelligenceMar-13-2024

Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks). Accordingly, we introduce StreamingDialogue, which compresses long dialogue history into conv-attn sinks with minimal losses, and thus reduces computational complexity quadratically with the number of sinks (i.e., the number of utterances). Current LLMs already demonstrate the ability to handle long context window, e.g., a window size of 200k or more. To this end, by compressing utterances into EoUs, our method has the potential to handle more than 200k of utterances, resulting in a prolonged dialogue learning. In order to minimize information losses from reconstruction after compression, we design two learning strategies of short-memory reconstruction (SMR) and long-memory reactivation (LMR). Our method outperforms strong baselines in dialogue tasks and achieves a 4 $\times$ speedup while reducing memory usage by 18 $\times$ compared to dense attention recomputation.

large language model, natural language, utterance, (15 more...)

arXiv.org Artificial Intelligence

2403.08312

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

"In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning

Cheng, Chuanqi, Tu, Quan, Wu, Wei, Shang, Shuo, Mao, Cunli, Yu, Zhengtao, Yan, Rui

arXiv.org Artificial IntelligenceMar-12-2024

Personalized dialogue systems have gained significant attention in recent years for their ability to generate responses in alignment with different personas. However, most existing approaches rely on pre-defined personal profiles, which are not only time-consuming and labor-intensive to create but also lack flexibility. We propose In-Dialogue Learning (IDL), a fine-tuning framework that enhances the ability of pre-trained large language models to leverage dialogue history to characterize persona for completing personalized dialogue generation tasks without pre-defined profiles. Our experiments on three datasets demonstrate that IDL brings substantial improvements, with BLEU and ROUGE scores increasing by up to 200% and 247%, respectively. Additionally, the results of human evaluations further validate the efficacy of our proposed method.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.03102

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

DeepRicci: Self-supervised Graph Structure-Feature Co-Refinement for Alleviating Over-squashing

Sun, Li, Huang, Zhenhao, Wu, Hua, Ye, Junda, Peng, Hao, Yu, Zhengtao, Yu, Philip S.

arXiv.org Artificial IntelligenceJan-23-2024

Graph Neural Networks (GNNs) have shown great power for learning and mining on graphs, and Graph Structure Learning (GSL) plays an important role in boosting GNNs with a refined graph. In the literature, most GSL solutions either primarily focus on structure refinement with task-specific supervision (i.e., node classification), or overlook the inherent weakness of GNNs themselves (e.g., over-squashing), resulting in suboptimal performance despite sophisticated designs. In light of these limitations, we propose to study self-supervised graph structure-feature co-refinement for effectively alleviating the issue of over-squashing in typical GNNs. In this paper, we take a fundamentally different perspective of the Ricci curvature in Riemannian geometry, in which we encounter the challenges of modeling, utilizing and computing Ricci curvature. To tackle these challenges, we present a self-supervised Riemannian model, DeepRicci. Specifically, we introduce a latent Riemannian space of heterogeneous curvatures to model various Ricci curvatures, and propose a gyrovector feature mapping to utilize Ricci curvature for typical GNNs. Thereafter, we refine node features by geometric contrastive learning among different geometric views, and simultaneously refine graph structure by backward Ricci flow based on a novel formulation of differentiable Ricci curvature. Finally, extensive experiments on public datasets show the superiority of DeepRicci, and the connection between backward Ricci flow and over-squashing. Codes of our work are given in https://github.com/RiemanGraph/.

artificial intelligence, curvature, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2401.1278

Country:

Asia > China > Yunnan Province (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Prompt Based Tri-Channel Graph Convolution Neural Network for Aspect Sentiment Triplet Extraction

Peng, Kun, Jiang, Lei, Peng, Hao, Liu, Rui, Yu, Zhengtao, Ren, Jiaqian, Hao, Zhifeng, Yu, Philip S.

arXiv.org Artificial IntelligenceDec-24-2023

Aspect Sentiment Triplet Extraction (ASTE) is an emerging task to extract a given sentence's triplets, which consist of aspects, opinions, and sentiments. Recent studies tend to address this task with a table-filling paradigm, wherein word relations are encoded in a two-dimensional table, and the process involves clarifying all the individual cells to extract triples. However, these studies ignore the deep interaction between neighbor cells, which we find quite helpful for accurate extraction. To this end, we propose a novel model for the ASTE task, called Prompt-based Tri-Channel Graph Convolution Neural Network (PT-GCN), which converts the relation table into a graph to explore more comprehensive relational information. Specifically, we treat the original table cells as nodes and utilize a prompt attention score computation module to determine the edges' weights. This enables us to construct a target-aware grid-like graph to enhance the overall extraction process. After that, a triple-channel convolution module is conducted to extract precise sentiment knowledge. Extensive experiments on the benchmark datasets show that our model achieves state-of-the-art performance. The code is available at https://github.com/KunPunCN/PT-GCN.

artificial intelligence, machine learning, relation, (14 more...)

arXiv.org Artificial Intelligence

2312.11152

Country:

North America > United States > Illinois (0.14)
Asia > China > Yunnan Province (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hierarchical and Incremental Structural Entropy Minimization for Unsupervised Social Event Detection

Cao, Yuwei, Peng, Hao, Yu, Zhengtao, Yu, Philip S.

arXiv.org Artificial IntelligenceDec-19-2023

As a trending approach for social event detection, graph neural network (GNN)-based methods enable a fusion of natural language semantics and the complex social network structural information, thus showing SOTA performance. However, GNN-based methods can miss useful message correlations. Moreover, they require manual labeling for training and predetermining the number of events for prediction. In this work, we address social event detection via graph structural entropy (SE) minimization. While keeping the merits of the GNN-based methods, the proposed framework, HISEvent, constructs more informative message graphs, is unsupervised, and does not require the number of events given a priori. Specifically, we incrementally explore the graph neighborhoods using 1-dimensional (1D) SE minimization to supplement the existing message graph with edges between semantically related messages. We then detect events from the message graph by hierarchically minimizing 2-dimensional (2D) SE. Our proposed 1D and 2D SE minimization algorithms are customized for social event detection and effectively tackle the efficiency problem of the existing SE minimization algorithms. Extensive experiments show that HISEvent consistently outperforms GNN-based methods and achieves the new SOTA for social event detection under both closed- and open-set settings while being efficient and robust.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.11891

Country:

North America > United States > Illinois (0.14)
Asia > China > Yunnan Province (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Social Events (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Uncertainty-guided Boundary Learning for Imbalanced Social Event Detection

Ren, Jiaqian, Peng, Hao, Jiang, Lei, Liu, Zhiwei, Wu, Jia, Yu, Zhengtao, Yu, Philip S.

arXiv.org Artificial IntelligenceOct-29-2023

Real-world social events typically exhibit a severe class-imbalance distribution, which makes the trained detection model encounter a serious generalization challenge. Most studies solve this problem from the frequency perspective and emphasize the representation or classifier learning for tail classes. While in our observation, compared to the rarity of classes, the calibrated uncertainty estimated from well-trained evidential deep learning networks better reflects model performance. To this end, we propose a novel uncertainty-guided class imbalance learning framework - UCL$_{SED}$, and its variant - UCL-EC$_{SED}$, for imbalanced social event detection tasks. We aim to improve the overall model performance by enhancing model generalization to those uncertain classes. Considering performance degradation usually comes from misclassifying samples as their confusing neighboring classes, we focus on boundary learning in latent space and classifier learning with high-quality uncertainty estimation. First, we design a novel uncertainty-guided contrastive learning loss, namely UCL and its variant - UCL-EC, to manipulate distinguishable representation distribution for imbalanced data. During training, they force all classes, especially uncertain ones, to adaptively adjust a clear separable boundary in the feature space. Second, to obtain more robust and accurate class uncertainty, we combine the results of multi-view evidential classifiers via the Dempster-Shafer theory under the supervision of an additional calibration method. We conduct experiments on three severely imbalanced social event datasets including Events2012\_100, Events2018\_100, and CrisisLexT\_7. Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2310.19247

Country:

North America > United States > Illinois (0.14)
Asia > China > Yunnan Province (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Social Events (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)

Add feedback