LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature
Würsch, Maxime, Kucharavy, Andrei, David, Dimitri Percia, Mermoud, Alain
–arXiv.org Artificial Intelligence
Secure and reliable information systems have become a central requirement for the operational continuity of the vast majority of goods and services providers [42]. However, securing information systems in a fast-paced ecosystem of technological changes and innovations is hard [3]. New technologies in cybersecurity have short life cycles and constantly evolve [13]. This exposes information systems to attacks that exploit vulnerabilities and security gaps [3]. Hence, cybersecurity practitioners and researchers need to stay updated on the latest developments and trends to prevent incidents and increase resilience [14]. A common approach to gather cured and synthesized information about such developments is to apply bibliometrics-based knowledge entity extraction and comparison through embedding similarity [10, 50, 61] - recently boosted by the availability of entity extractors based on large language models (LLMs) [17, 46]. However, it is unclear how appropriate this approach is for the cybersecurity literature. We address this by emulating such an entity extraction and comparison pipeline, and by using a variety of common entity extractors - LLM-based and not -, and evaluating how relevant embeddings of extracted entities are to document understanding tasks - namely classification of arXiv documents as relevant to cybersecurity (https://arxiv.org). While LLMs burst into public attention in late 2022 - in large part thanks to public trials of conversationally fine-tuned LLMs [40, 4, 31]-, modern large language models pre-trained on large amounts of data trace their roots back to ELMo LLM, first released in 2018 [45].
arXiv.org Artificial Intelligence
Dec-12-2023
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- Middle East
- Jordan (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Middle East
- Europe
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- Middle East > Republic of Türkiye
- North America
- Canada
- Alberta > Census Division No. 11
- Edmonton Metropolitan Region > Edmonton (0.04)
- Ontario > Toronto (0.04)
- Alberta > Census Division No. 11
- United States
- Arizona > Maricopa County
- Scottsdale (0.04)
- California (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York > New York County
- New York City (0.04)
- Washington > King County
- Seattle (0.04)
- Arizona > Maricopa County
- Canada
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Government > Military
- Cyberwarfare (0.95)
- Information Technology > Security & Privacy (1.00)
- Government > Military
- Technology: