A Weak Supervision Approach for Monitoring Recreational Drug Use Effects in Social Media
Prieto-Santamaría, Lucía, Iglesias, Alba Cortés, Giné, Claudio Vidal, Calderón, Fermín Fernández, Lozano, Óscar M., Rodríguez-González, Alejandro
–arXiv.org Artificial Intelligence
Understanding the real-world effects of recreational drug use remains a critical challenge in public health and biomedical research, especially as traditional surveillance systems often underrepresent user experiences. In this study, we leverage social media (specifically Twitter) as a rich and unfiltered source of user-reported effects associated with three emerging psychoactive substances: ecstasy, GHB, and 2C-B. By combining a curated list of slang terms with biomedical concept extraction via MetaMap, we identified and weakly annotated over 92,000 tweets mentioning these substances. Each tweet was labeled with a polarity reflecting whether it reported a positive or negative effect, following an expert-guided heuristic process. We then performed descriptive and comparative analyses of the reported phenotypic outcomes across substances and trained multiple machine learning classifiers to predict polarity from tweet content, accounting for strong class imbalance using techniques such as cost-sensitive learning and synthetic oversampling. The top performance on the test set was obtained from eXtreme Gradient Boosting with cost-sensitive learning (F1 = 0.885, AUPRC = 0.934). Our findings reveal that Twitter enables the detection of substance-specific phenotypic effects, and that polarity classification models can support real-time pharmacovigilance and drug effect characterization with high accuracy.
arXiv.org Artificial Intelligence
Sep-22-2025
- Country:
- Africa (0.04)
- Asia (0.04)
- Europe
- Spain
- Andalusia > Huelva Province
- Huelva (0.04)
- Galicia > Madrid (0.04)
- Andalusia > Huelva Province
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Spain
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Ensemble Learning (0.91)
- Neural Networks (1.00)
- Performance Analysis > Accuracy (0.68)
- Statistical Learning (0.96)
- Natural Language > Text Processing (0.88)
- Machine Learning
- Communications > Social Media (1.00)
- Artificial Intelligence
- Information Technology