Explorers at #SMM4H 2023: Enhancing BERT for Health Applications through Knowledge and Model Fusion
Yue, Xutong, Wang, Xilai, He, Yuxin, Zhou, Zhenkun
–arXiv.org Artificial Intelligence
Task 1 and Task 4 focus on COVID-19 diagnosis in self-reported English tweets and self-reported social anxiety disorder diagnosis posted in Reddit. Task 3 concentrates on detecting and extracting COVID-19 symptoms in Latin American Spanish tweets description. The dataset of Task 1 contain texts from Twitter that self-report the COVID-19 diagnosis (labeled as '1') or not (labeled as '0'). The size of the training set, validation set, and test set are 7600, 400, 10000. Task 4 contains 8117 posts from users aged 12 to 25. Positive cases (labeled as '1') represent self-reported or probable social anxiety disorder diagnoses, while negative cases (labeled as '0') include users without a diagnosis or with uncertain diagnostic status. The sizes of the training set, validation set, and those of test set are 6090, 680, 1347. Task 3 focuses on the detection and extraction of COVID-19 symptoms in tweets written specifically in Latin American Spanish, includes both personal self-reports and third-party mentions of symptoms. There are 6021 of the training data, 1979 for validation, and 2150 for testing.
arXiv.org Artificial Intelligence
Dec-17-2023
- Country:
- North America > Canada
- Ontario > National Capital Region > Ottawa (0.04)
- Asia > China
- North America > Canada
- Genre:
- Research Report (0.40)
- Industry:
- Technology: