NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security

Oct-10-2025, 04:59:21 GMT–Neural Information Processing Systems

Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated.

ctf challenge, netcat, python, (17 more...)

Neural Information Processing Systems

Oct-10-2025, 04:59:21 GMT

Conferences PDF

Country:
- Europe > United Kingdom (0.04)
- North America
  - United States > Hawaii (0.04)
  - Canada > British Columbia
    - Vancouver (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - UAE > Abu Dhabi Emirate
    - Abu Dhabi (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Government
  - Military > Cyberwarfare (0.36)
  - Regional Government > North America Government
    - United States Government (0.46)

Technology:
- Information Technology
  - Software (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
69d97a6493fbf016fff0a751f253ad18-Paper-Datasets_and_Benchmarks_Track.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found