TAMING DNS DATA: STACKING MACHINE LEARNING ALGORITHMS TO FIND DGA MALWARE ACTIVITIES - Cybersecurity Insiders

#artificialintelligence 

The very first layer deals with information contained in the structure of the domain name string. Depending on the DGA malware the names of the generated domains could be random alphanumeric strings or concatenation of english words. In any case, being generated they tend to be different from the common names used for domains. One could use natural language processing (NLP) algorithms to trained on large corpus of the domain names requested by normal users to assign string information score that reflects likelihood for a domain name to be anomalous. That string information score is assigned to all DNS requests and used as classification feature in the next level.