AITopics | Antebi, Sagiv

Collaborating Authors

Antebi, Sagiv

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack

Antebi, Sagiv, Habler, Edan, Shabtai, Asaf, Elovici, Yuval

arXiv.org Artificial IntelligenceJan-14-2025

Large language models (LLMs) have become essential digital task assistance tools. Their training relies heavily on the collection of vast amounts of data, which may include copyright-protected or sensitive information. Recent studies on the detection of pretraining data in LLMs have primarily focused on sentence-level or paragraph-level membership inference attacks (MIAs), usually involving probability analysis of the target model prediction tokens. However, the proposed methods often demonstrate poor performance, specifically in terms of accuracy, failing to account for the semantic importance of textual content and word significance. To address these shortcomings, we propose Tag&Tab, a novel approach for detecting data that has been used as part of the LLM pretraining. Our method leverages advanced natural language processing (NLP) techniques to tag keywords in the input text - a process we term Tagging. Then, the LLM is used to obtain the probabilities of these keywords and calculate their average log-likelihood to determine input text membership, a process we refer to as Tabbing. Our experiments on three benchmark datasets (BookMIA, MIMIR, and the Pile) and several open-source LLMs of varying sizes demonstrate an average increase in the AUC scores ranging from 4.1% to 12.1% over state-of-the-art methods. Tag&Tab not only sets a new standard for data leakage detection in LLMs, but its outstanding performance is a testament to the importance of words in MIAs on LLMs.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2501.08454

Genre: Research Report > Promising Solution (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GPT in Sheep's Clothing: The Risk of Customized GPTs

Antebi, Sagiv, Azulay, Noam, Habler, Edan, Ganon, Ben, Shabtai, Asaf, Elovici, Yuval

arXiv.org Artificial IntelligenceJan-17-2024

Generative artificial intelligence (GenAI) models are a type of deep learning neural network model capable of learning from large datasets and generating new content from a given context. They represent a significant leap in the ability of the artificial intelligence (AI) field to not just interpret data but also to create something new, including text, images, videos, code, and sound [2]. Large language models (LLMs) are a type of GenAI model designed to understand and generate natural language. The market for LLMs is estimated to reach 40.8 billion USD by 2029, up from 10.5 billion USD in 2022 [10]. Organizations are currently competing to develop the most sophisticated LLM capable of mimicking human-like conversations and tasks. This has led to the creation of models such as OpenAI's ChatGPT,

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.09075

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.92)

Add feedback