A Robust Cybersecurity Topic Classification Tool
Pelofske, Elijah, Liebrock, Lorie M., Urias, Vincent
–arXiv.org Artificial Intelligence
Identifying cybersecurity discussions in open forums at scale is a topic of great interest for the purpose of mitigating and understanding modern cyber threats [1-3]. The challenge is that these discussions are typically quite noisy (i.e., they contain community known synonyms or acronyms or slang) and it is difficult to get labelled data in order to train resilient NLP (natural language processing) topic classifiers. Additionally, it is important that a tool that detects cybersecurity discussions in internet text sources is scalable and offers low errors rates (in particular, both low false negative rates and low false positive rates). In order to address the challenges of finding relevant cybersecurity labelled data, we use a technique that gathers posts or articles from different internet sources that have user defined topic labels. We then collect and label the training text as being cybersecurity related or not based on the subset of labels that the text source offers.
arXiv.org Artificial Intelligence
Dec-27-2022
- Country:
- North America > United States
- New Mexico
- Socorro County > Socorro (0.04)
- Bernalillo County > Albuquerque (0.04)
- New Mexico
- Asia > Singapore
- Central Region > Singapore (0.04)
- North America > United States
- Genre:
- Research Report (0.82)
- Overview (0.67)
- Industry:
- Technology: