A Robust Cybersecurity Topic Classification Tool

Pelofske, Elijah, Liebrock, Lorie M., Urias, Vincent

Dec-27-2022–arXiv.org Artificial Intelligence

Identifying cybersecurity discussions in open forums at scale is a topic of great interest for the purpose of mitigating and understanding modern cyber threats [1-3]. The challenge is that these discussions are typically quite noisy (i.e., they contain community known synonyms or acronyms or slang) and it is difficult to get labelled data in order to train resilient NLP (natural language processing) topic classifiers. Additionally, it is important that a tool that detects cybersecurity discussions in internet text sources is scalable and offers low errors rates (in particular, both low false negative rates and low false positive rates). In order to address the challenges of finding relevant cybersecurity labelled data, we use a technique that gathers posts or articles from different internet sources that have user defined topic labels. We then collect and label the training text as being cybersecurity related or not based on the subset of labels that the text source offers.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-27-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New Mexico
    - Socorro County > Socorro (0.04)
    - Bernalillo County > Albuquerque (0.04)
- Asia > Singapore
  - Central Region > Singapore (0.04)

Genre:
- Research Report (0.82)
- Overview (0.67)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government
  - Military > Cyberwarfare (1.00)
  - Regional Government > North America Government
    - United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Performance Analysis
    - Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found