AITopics | Wu, Zhenpeng

Collaborating Authors

Wu, Zhenpeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs

Wu, Zhenpeng, Lou, Jian, Zheng, Zibin, Chen, Chuan

arXiv.org Artificial IntelligenceDec-10-2024

Large language models (LLMs) have been shown to memorize and reproduce content from their training data, raising significant privacy concerns, especially with web-scale datasets. Existing methods for detecting memorization are largely sample-specific, relying on manually crafted or discretely optimized memory-inducing prompts generated on a per-sample basis, which become impractical for dataset-level detection due to the prohibitive computational cost of iterating over all samples. In real-world scenarios, data owners may need to verify whether a susceptible LLM has memorized their dataset, particularly if the LLM may have collected the data from the web without authorization. To address this, we introduce \textit{MemHunter}, which trains a memory-inducing LLM and employs hypothesis testing to efficiently detect memorization at the dataset level, without requiring sample-specific memory inducing. Experiments on models such as Pythia and Llama-2 demonstrate that \textit{MemHunter} can extract up to 40\% more training data than existing methods under constrained time resources and reduce search time by up to 80\% when integrated as a plug-in. Crucially, \textit{MemHunter} is the first method capable of dataset-level memorization detection, providing an indispensable tool for assessing privacy risks in LLMs that are powered by vast web-sourced datasets.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.07261

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Challenges and Remedies to Privacy and Security in AIGC: Exploring the Potential of Privacy Computing, Blockchain, and Beyond

Chen, Chuan, Wu, Zhenpeng, Lai, Yanyi, Ou, Wenlin, Liao, Tianchi, Zheng, Zibin

arXiv.org Artificial IntelligenceJun-1-2023

Artificial Intelligence Generated Content (AIGC) is one of the latest achievements in AI development. The content generated by related applications, such as text, images and audio, has sparked a heated discussion. Various derived AIGC applications are also gradually entering all walks of life, bringing unimaginable impact to people's daily lives. However, the rapid development of such generative tools has also raised concerns about privacy and security issues, and even copyright issues in AIGC. We note that advanced technologies such as blockchain and privacy computing can be combined with AIGC tools, but no work has yet been done to investigate their relevance and prospect in a systematic and detailed way. Therefore it is necessary to investigate how they can be used to protect the privacy and security of data in AIGC by fully exploring the aforementioned technologies. In this paper, we first systematically review the concept, classification and underlying technologies of AIGC. Then, we discuss the privacy and security challenges faced by AIGC from multiple perspectives and purposefully list the countermeasures that currently exist. We hope our survey will help researchers and industry to build a more secure and robust AIGC system.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.00419

Country:

Europe (0.92)
Asia > Middle East > Israel (0.14)
Asia > China > Guangdong Province (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services > e-Commerce Services (0.67)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
(10 more...)

Add feedback