Evaluating Methods for Distinguishing Between Human-Readable Text and Garbled Text

Henderson, Jette L. (The University of Texas at Austin) | Frazee, Daniel J. (The University of Texas at Austin) | Siegel, Nick P. (The University of Texas at Austin) | Martin, Cheryl E. (The University of Texas at Austin) | Liu, Alexander Y. (The University of Texas at Austin)

AAAI Conferences 

In some cybersecurity applications, it is useful to differenti- ate between human-readable text and garbled text (e.g., en- coded or encrypted text). Automated methods are necessary for performing this task on large volumes of data. Which method is best is an open question that depends on the spe- cific problem context. In this paper, we explore this open question via empirical tests of many automated categoriza- tion methods for differentiating human-readable versus gar- bled text under a variety of conditions (e.g., different class priors, different problem contexts, concept drift, etc.). The results indicate that the best approaches tend to be either variants of naïve Bayes or classifiers that use low- dimensional, structural features. The results also indicate that concept drift is one of the most problematic issues when classifying garbled text.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found