AITopics | Wang, Zongwei

Collaborating Authors

Wang, Zongwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Graph with Sequence: Broad-Range Semantic Modeling for Fake News Detection

Yin, Junwei, Gao, Min, Shu, Kai, Li, Wentao, Huang, Yinqiu, Wang, Zongwei

arXiv.org Artificial IntelligenceDec-7-2024

The rapid proliferation of fake news on social media threatens social stability, creating an urgent demand for more effective detection methods. While many promising approaches have emerged, most rely on content analysis with limited semantic depth, leading to suboptimal comprehension of news content.To address this limitation, capturing broader-range semantics is essential yet challenging, as it introduces two primary types of noise: fully connecting sentences in news graphs often adds unnecessary structural noise, while highly similar but authenticity-irrelevant sentences introduce feature noise, complicating the detection process. To tackle these issues, we propose BREAK, a broad-range semantics model for fake news detection that leverages a fully connected graph to capture comprehensive semantics while employing dual denoising modules to minimize both structural and feature noise. The semantic structure denoising module balances the graph's connectivity by iteratively refining it between two bounds: a sequence-based structure as a lower bound and a fully connected graph as the upper bound. This refinement uncovers label-relevant semantic interrelations structures. Meanwhile, the semantic feature denoising module reduces noise from similar semantics by diversifying representations, aligning distinct outputs from the denoised graph and sequence encoders using KL-divergence to achieve feature diversification in high-dimensional space. The two modules are jointly optimized in a bi-level framework, enhancing the integration of denoised semantics into a comprehensive representation for detection. Extensive experiments across four datasets demonstrate that BREAK significantly outperforms existing methods in identifying fake news. Code is available at https://anonymous.4open.science/r/BREAK.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.05672

Country:

Asia (0.93)
North America > United States (0.68)

Genre: Research Report > Promising Solution (0.66)

Industry:

Media > News (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

CopyLens: Dynamically Flagging Copyrighted Sub-Dataset Contributions to LLM Outputs

Ma, Qichao, Zhu, Rui-Jie, Liu, Peiye, Yan, Renye, Zhang, Fahong, Liang, Ling, Li, Meng, Yu, Zhaofei, Wang, Zongwei, Cai, Yimao, Huang, Tiejun

arXiv.org Artificial IntelligenceOct-6-2024

Large Language Models (LLMs) have become pervasive due to their knowledge absorption and text-generation capabilities. Concurrently, the copyright issue for pretraining datasets has been a pressing concern, particularly when generation includes specific styles. Previous methods either focus on the defense of identical copyrighted outputs or find interpretability by individual tokens with computational burdens. However, the gap between them exists, where direct assessments of how dataset contributions impact LLM outputs are missing. Once the model providers ensure copyright protection for data holders, a more mature LLM community can be established. To address these limitations, we introduce CopyLens, a new framework to analyze how copyrighted datasets may influence LLM responses. Specifically, a two-stage approach is employed: First, based on the uniqueness of pretraining data in the embedding space, token representations are initially fused for potential copyrighted texts, followed by a lightweight LSTM-based network to analyze dataset contributions. With such a prior, a contrastive-learning-based non-copyright OOD detector is designed. Our framework can dynamically face different situations and bridge the gap between current copyright detection methods. Experiments show that CopyLens improves efficiency and accuracy by 15.2% over our proposed baseline, 58.7% over prompt engineering methods, and 0.21 AUC over OOD detection baselines.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.04454

Country:

North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.40)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Poisoning Attacks Against Contrastive Recommender Systems

Wang, Zongwei, Yu, Junliang, Gao, Min, Yin, Hongzhi, Cui, Bin, Sadiq, Shazia

arXiv.org Artificial IntelligenceNov-29-2023

Contrastive learning (CL) has recently gained significant popularity in the field of recommendation. Its ability to learn without heavy reliance on labeled data is a natural antidote to the data sparsity issue. Previous research has found that CL can not only enhance recommendation accuracy but also inadvertently exhibit remarkable robustness against noise. However, this paper identifies a vulnerability of CL-based recommender systems: Compared with their non-CL counterparts, they are even more susceptible to poisoning attacks that aim to promote target items. Our analysis points to the uniform dispersion of representations led by the CL loss as the very factor that accounts for this vulnerability. We further theoretically and empirically demonstrate that the optimization of CL loss can lead to smooth spectral values of representations. Based on these insights, we attempt to reveal the potential poisoning attacks against CL-based recommender systems. The proposed attack encompasses a dual-objective framework: One that induces a smoother spectral value distribution to amplify the CL loss's inherent dispersion effect, named dispersion promotion; and the other that directly elevates the visibility of target items, named rank promotion. We validate the destructiveness of our attack model through extensive experimentation on four datasets. By shedding light on these vulnerabilities, we aim to facilitate the development of more robust CL-based recommender systems.

artificial intelligence, machine learning, recommendation, (18 more...)

arXiv.org Artificial Intelligence

2311.18244

Country:

Asia > China (0.28)
Oceania > Australia > Queensland (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Bi-Level Optimization for Recommendation Denoising

Wang, Zongwei, Gao, Min, Li, Wentao, Yu, Junliang, Guo, Linxin, Yin, Hongzhi

arXiv.org Artificial IntelligenceJun-1-2023

The acquisition of explicit user feedback (e.g., ratings) in real-world recommender systems is often hindered by the need for active user involvement. To mitigate this issue, implicit feedback (e.g., clicks) generated during user browsing is exploited as a viable substitute. However, implicit feedback possesses a high degree of noise, which significantly undermines recommendation quality. While many methods have been proposed to address this issue by assigning varying weights to implicit feedback, two shortcomings persist: (1) the weight calculation in these methods is iteration-independent, without considering the influence of weights in previous iterations, and (2) the weight calculation often relies on prior knowledge, which may not always be readily available or universally applicable. To overcome these two limitations, we model recommendation denoising as a bi-level optimization problem. The inner optimization aims to derive an effective model for the recommendation, as well as guiding the weight determination, thereby eliminating the need for prior knowledge. The outer optimization leverages gradients of the inner optimization and adjusts the weights in a manner considering the impact of previous weights. To efficiently solve this bi-level optimization problem, we employ a weight generator to avoid the storage of weights and a one-step gradient-matching-based loss to significantly reduce computational time. The experimental results on three benchmark datasets demonstrate that our proposed approach outperforms both state-of-the-art general and denoising recommendation models. The code is available at https://github.com/CoderWZW/BOD.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Artificial Intelligence

2210.10321

Country:

Asia > China (0.46)
North America > United States (0.30)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Media (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.90)

Add feedback