ATOL: A Framework for Automated Analysis and Categorization of the Darkweb Ecosystem
Ghosh, Shalini (SRI International) | Porras, Phillip (SRI International) | Yegneswaran, Vinod (SRI International) | Nitz, Ken (SRI International) | Das, Ariyam (University of California, Los Angeles)
We present a framework for automated analysis and categorization of .onion websites in the darkweb to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a large-scale darkweb crawling infrastructure called OnionCrawler that acquires new onion domains on a daily basis, and crawls and indexes millions of pages from these new and previously known .onion sites. It stores this data into a research repository designed to help better understand Tor’s hidden service ecosystem. The analysis component of our framework is called Automated Tool for Onion Labeling (ATOL), which introduces a two-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) uses these terms to map onion site content to a set of thematic labels. We also present empirical results of ATOL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our darkweb repository, now over 70 million indexed pages. We find that ATOL can perform site-level thematic label assignment more accurately than keywordbased schemes developed by domain experts — we expand the analyst-provided keywords using an automatic keyword discovery algorithm, and get 12% gain in accuracy by using a machine learning classification model. We also show how ATOL can discover categories on previously unlabeled onions and discuss applications of ATOL in supporting various analyses and investigations of the darkweb.
Feb-4-2017
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (0.47)
- Industry:
- Government (0.88)
- Health & Medicine (0.68)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)
- Law Enforcement & Public Safety (0.93)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Performance Analysis > Accuracy (0.47)
- Statistical Learning (1.00)
- Natural Language > Information Retrieval (0.69)
- Machine Learning
- Communications (1.00)
- Data Science (1.00)
- Information Management > Search (0.94)
- Security & Privacy (1.00)
- Artificial Intelligence
- Information Technology