BadGPT-4o: stripping safety finetuning from GPT models

Dec-6-2024–arXiv.org Artificial Intelligence

LLM vendors expend substantial effort to secure their models and make them unhelpful to adversaries like cybercriminals (Touvron et al. 2023, Section 4.3) (OpenAI et al. 2024, Section 3) (OpenAI 2024a). However, LLMs have been repeatedly "jailbroken" out of these constraints (Chao et al. 2024; Mazeika et al. 2024; Souly et al. 2024). No robust LLM security measures are known. Classic jailbreaks encode LLM prompts to bypass model safeguards. They tend to be unstable, add a token overhead, and reduce model performance (Chao et al. 2024; Mazeika et al. 2024; Souly et al. 2024).

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Dec-6-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.90)
- Information Technology > Security & Privacy (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.50)