A Robust Cybersecurity Topic Classification Tool

Pelofske, Elijah, Liebrock, Lorie M., Urias, Vincent

arXiv.org Artificial Intelligence 

Identifying cybersecurity discussions in open forums at scale is a topic of great interest for the purpose of mitigating and understanding modern cyber threats [1-3]. The challenge is that these discussions are typically quite noisy (i.e., they contain community known synonyms or acronyms or slang) and it is difficult to get labelled data in order to train resilient NLP (natural language processing) topic classifiers. Additionally, it is important that a tool that detects cybersecurity discussions in internet text sources is scalable and offers low errors rates (in particular, both low false negative rates and low false positive rates). In order to address the challenges of finding relevant cybersecurity labelled data, we use a technique that gathers posts or articles from different internet sources that have user defined topic labels. We then collect and label the training text as being cybersecurity related or not based on the subset of labels that the text source offers.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found