llm filter
- Europe > Switzerland > Zürich > Zürich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- North America > United States > New York (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Prompt Injection Attacks in Defended Systems
Khomsky, Daniil, Maloyan, Narek, Nutfullin, Bulat
Large language models play a crucial role in modern natural language processing technologies. However, their extensive use also introduces potential security risks, such as the possibility of black-box attacks. These attacks can embed hidden malicious features into the model, leading to adverse consequences during its deployment. This paper investigates methods for black-box attacks on large language models with a three-tiered defense mechanism. It analyzes the challenges and significance of these attacks, highlighting their potential implications for language processing system security. Existing attack and defense methods are examined, evaluating their effectiveness and applicability across various scenarios. Special attention is given to the detection algorithm for black-box attacks, identifying hazardous vulnerabilities in language models and retrieving sensitive information. This research presents a methodology for vulnerability detection and the development of defensive strategies against black-box attacks on large language models.
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Debenedetti, Edoardo, Rando, Javier, Paleka, Daniel, Florin, Silaghi Fineas, Albastroiu, Dragos, Cohen, Niv, Lemberg, Yuval, Ghosh, Reshmi, Wen, Rui, Salem, Ahmed, Cherubin, Giovanni, Zanella-Beguelin, Santiago, Schmid, Robin, Klemm, Victor, Miki, Takahiro, Li, Chenhao, Kraft, Stefan, Fritz, Mario, Tramèr, Florian, Abdelnabi, Sahar, Schönherr, Lea
Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt. The competition was organized in two phases. In the first phase, teams developed defenses to prevent the model from leaking the secret. During the second phase, teams were challenged to extract the secrets hidden for defenses proposed by the other teams. This report summarizes the main insights from the competition. Notably, we found that all defenses were bypassed at least once, highlighting the difficulty of designing a successful defense and the necessity for additional research to protect LLM systems. To foster future research in this direction, we compiled a dataset with over 137k multi-turn attack chats and open-sourced the platform.
- North America > United States > New York (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
AlpaGasus: Training A Better Alpaca with Fewer Data
Chen, Lichang, Li, Shiyang, Yan, Jun, Wang, Hai, Gunaratna, Kalpa, Yadav, Vikas, Tang, Zheng, Srinivasan, Vijay, Zhou, Tianyi, Huang, Heng, Jin, Hongxia
Large language models~(LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: \url{https://lichang-chen.github.io/AlpaGasus/}
- North America > United States > California (0.14)
- Europe > France (0.04)
- Asia > India > NCT > New Delhi (0.04)
- (4 more...)
- Banking & Finance (1.00)
- Government (0.68)
- Health & Medicine > Consumer Health (0.46)