Language Models for Adult Service Website Text Analysis
Freeman, Nickolas, Nguyen, Thanh, Bott, Gregory, Parton, Jason, Francel, Collin
–arXiv.org Artificial Intelligence
Sex trafficking refers to the use of force, fraud, or coercion to compel an individual to perform in commercial sex acts against their will. Adult service websites (ASWs) have and continue to be linked to sex trafficking, offering a platform for traffickers to advertise their victims. Thus, organizations involved in the fight against sex trafficking often use ASW data when attempting to identify potential sex trafficking victims. A critical challenge in transforming ASW data into actionable insight is text analysis. Previous research using ASW data has shown that ASW ad text is important for linking ads. However, working with this text is challenging due to its extensive use of emojis, poor grammar, and deliberate obfuscation to evade law enforcement scrutiny. We conduct a comprehensive study of language modeling approaches for this application area, including simple information retrieval methods, pre-trained transformers, and custom transformer models. We demonstrate that characteristics of ASW text data allow efficient custom transformer models to be trained with relatively small GPU resources and used efficiently for inference on consumer hardware. Our custom models outperform fine-tuned variants of well-known encoder-only transformer models, including BERT-base, RoBERTa, and ModernBERT, on accuracy, recall, F1 score, and ROC AUC. The models we develop represent a significant advancement in ASW text analysis, which can be leveraged in a variety of downstream applications and research. Introduction Sex trafficking involves the use of force, fraud, or coercion to compel an individual to perform commercial sex services. To effectively combat this problem, law enforcement organizations (LEOs), non-profit organizations (NPOs), and researchers must transform sex ad data into actionable intelligence. Previous research using ASW data has shown that assessing the similarity of ASW ad text is important for linking ads.
arXiv.org Artificial Intelligence
Jul-16-2025