Shell Language Processing: Machine Learning for Security Intrusion Detection with Linux auditd
Applicability of Machine Learning (ML) algorithms -- in the industry, tutorials, and courses -- is heavily biased towards building the ML models themselves. From our point of view, however, the data preprocessing step (i.e., transforming textual system logging to numerical arrays that capture valuable insights from the data) possesses the highest psychological and epistemic gap for security engineers and analysts trying to weaponize their data. There are enormous log collection hubs that lack qualitative analytics to infer necessary visibility out of acquired data. TeraBytes of logs are often collected to perform only basic analytics (e.g., basic signature-based rules) and are considered to be used in an ad hoc, reactive manner -- if an investigation is needed. Many valuable inferences can be acquired by defining manual heuristics on top of this data.
Sep-22-2022, 16:08:30 GMT