Efficient Adversarial Training in LLMs with Continuous Attacks
–Neural Information Processing Systems
The authors use Greedy Coordinate Gradient (GCG) to generate discrete adversarial suffixes in natural language.
Neural Information Processing Systems
Dec-27-2025, 17:47:37 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Information Technology > Security & Privacy (0.48)
- Government (0.48)
- Technology: