Goto

Collaborating Authors

 covid


CompARE: A Computational framework for Airborne Respiratory disease Evaluation integrating flow physics and human behavior

Leong, Fong Yew, Kwak, Jaeyoung, Ge, Zhengwei, Ooi, Chin Chun, Fong, Siew-Wai, Tay, Matthew Zirui, Qian, Hua, Kang, Chang Wei, Cai, Wentong, Li, Hongying

arXiv.org Artificial Intelligence

The risk of indoor airborne transmission among co-located individuals is generally non-uniform, which remains a critical challenge for public health modelling. Thus, we present CompARE, an integrated risk assessment framework for indoor airborne disease transmission that reveals a striking bimodal distribution of infection risk driven by airflow dynamics and human behavior. Combining computational fluid dynamics (CFD), machine learning (ML), and agent-based modeling (ABM), our model captures the complex interplay between aerosol transport, human mobility, and environmental context. Based on a prototypical childcare center, our approach quantifies how incorporation of ABM can unveil significantly different infection risk profiles across agents, with more than two-fold change in risk of infection between the individuals with the lowest and highest risks in more than 90% of cases, despite all individuals being in the same overall environment. We found that infection risk distributions can exhibit not only a striking bimodal pattern in certain activities but also exponential decay and fat-tailed behavior in others. Specifically, we identify low-risk modes arising from source containment, as well as high-risk tails from prolonged close contact. Our approach enables near-real-time scenario analysis and provides policy-relevant quantitative insights into how ventilation design, spatial layout, and social distancing policies can mitigate transmission risk. These findings challenge simple distance-based heuristics and support the design of targeted, evidence-based interventions in high-occupancy indoor settings.


A survey of using EHR as real-world evidence for discovering and validating new drug indications

Talukdar, Nabasmita, Zhang, Xiaodan, Paithankar, Shreya, Wang, Hui, Chen, Bin

arXiv.org Artificial Intelligence

Electronic Health Records (EHRs) have been increasingly used as real-world evidence (RWE) to support the discovery and validation of new drug indications. This paper surveys current approaches to EHR-based drug repurposing, covering data sources, processing methodologies, and representation techniques. It discusses study designs and statistical frameworks for evaluating drug efficacy. Key challenges in validation are discussed, with emphasis on the role of large language models (LLMs) and target trial emulation. By synthesizing recent developments and methodological advances, this work provides a foundational resource for researchers aiming to translate real-world data into actionable drug-repurposing evidence.


Improving Topic Modeling of Social Media Short Texts with Rephrasing: A Case Study of COVID-19 Related Tweets

Xin, Wangjiaxuan, Yin, Shuhua, Chen, Shi, Ge, Yaorong

arXiv.org Artificial Intelligence

Social media platforms such as Twitter (now X) provide rich data for analyzing public discourse, especially during crises such as the COVID-19 pandemic. However, the brevity, informality, and noise of social media short texts often hinder the effectiveness of traditional topic modeling, producing incoherent or redundant topics that are often difficult to interpret. To address these challenges, we have developed \emph{TM-Rephrase}, a model-agnostic framework that leverages large language models (LLMs) to rephrase raw tweets into more standardized and formal language prior to topic modeling. Using a dataset of 25,027 COVID-19-related Twitter posts, we investigate the effects of two rephrasing strategies, general- and colloquial-to-formal-rephrasing, on multiple topic modeling methods. Results demonstrate that \emph{TM-Rephrase} improves three metrics measuring topic modeling performance (i.e., topic coherence, topic uniqueness, and topic diversity) while reducing topic redundancy of most topic modeling algorithms, with the colloquial-to-formal strategy yielding the greatest performance gains and especially for the Latent Dirichlet Allocation (LDA) algorithm. This study contributes to a model-agnostic approach to enhancing topic modeling in public health related social media analysis, with broad implications for improved understanding of public discourse in health crisis as well as other important domains.


The Shift Towards Preprints in AI Policy Research: A Comparative Study of Preprint Trends in the U.S., Europe, and South Korea

Suh, Simon

arXiv.org Artificial Intelligence

The adoption of open science has quickly changed how artificial intelligence (AI) policy research is distributed globally. This study examines the regional trends in the citation of preprints, specifically focusing on the impact of two major disruptive events: the COVID-19 pandemic and the release of ChatGPT, on research dissemination patterns in the United States, Europe, and South Korea from 2015 to 2024. Using bibliometrics data from the Web of Science, this study tracks how global disruptive events influenced the adoption of preprints in AI policy research and how such shifts vary by region. By marking the timing of these disruptive events, the analysis reveals that while all regions experienced growth in preprint citations, the magnitude and trajectory of change varied significantly. The United States exhibited sharp, event-driven increases; Europe demonstrated institutional growth; and South Korea maintained consistent, linear growth in preprint adoption. These findings suggest that global disruptions may have accelerated preprint adoption, but the extent and trajectory are shaped by local research cultures, policy environments, and levels of open science maturity. This paper emphasizes the need for future AI governance strategies to consider regional variability in research dissemination and highlights opportunities for further longitudinal and comparative research to deepen our understanding of open-access adoption in AI policy development.


Property Classification of Vacation Rental Properties during Covid-19

Aghaebe, Favour Yahdii, Foley, Dustin, Atwell, Eric, Clark, Stephen

arXiv.org Artificial Intelligence

University of Leeds GISRUK 2024 Summary This abstract advocates for employing clustering techniques to classify vacation rental properties active during the Covid pandemic to identify inherent patterns and behaviours . The dataset, a collaboration betwee n the ESRC funded Consumer Data Research Centre (CDRC) and AirDNA, encompasses data for over a million properties and hosts. Utili s ing K - means and K - medoids clustering techniques, we identify homogenous groups and their common characteristics. Our findings enhance comprehension of the intricacies of vacation rental evaluations and could potentially be utilised in the creation of targeted, cluster - specific policies. KEYWORDS: Covid - 19, Hospitality, Clustering, Unsupervised Machine Learnin g 1. Introduction Travel and tourism ha ve been embedded into our human experience for centuries.


Linguistic Patterns in Pandemic-Related Content: A Comparative Analysis of COVID-19, Constraint, and Monkeypox Datasets

Sikosana, Mkululi, Maudsley-Barton, Sean, Ajao, Oluwaseun

arXiv.org Artificial Intelligence

-- This study conducts a computational linguistic analysis of pandemic - related online discourse to examine how language distinguishes health misinformation from factual communication. Drawing on three corpora -- COVID - 19 false narratives (n = 7,588), general COVID - 19 content (n = 10,700), and Monkeypox - related posts (n = 5,787) -- we identify significant differences in readability, rhetorical markers, and persuasive language use. COVID - 19 misinformation exhibited markedly lower readability scores and contained over twice the frequency of fear - related or persuasi ve terms compared to the other datasets. It also showed minimal use of exclamation marks, contrasting with the more emotive style of Monkeypox content. These patterns suggest that misinformation employs a deliberately complex rhetorical style embedded with em otional cues, a combination that may enhance its perceived credibility. Our findings contribute to the growing body of work on digital health misinformation by highlighting linguistic indicators that may aid detection efforts. They also inform public health messaging strategies and theoretical models of crisis communication in networked media environments. At the same time, the study acknowledges certain limitations, including reliance on traditional readability indices, use of a deliberately narrow persuasive lexicon, and reliance on static aggregate analysis. Future research should therefore incorporate longitudinal designs, broader emotion lexicons, and platform - sensitive approaches to strengthe n robustness. The data and code is available at: https://doi.org/10.5281/zenodo.17024569 The COVID - 19 pandemic challenged global health systems. The proliferation of health - related information on digital platforms accelerates dramatically during public health crises, creating opportunities for rapid knowledge dissemination but also challenges related to misinformation (Sikosana et al., 2024; Sikosana et al., 2025). This dual nature of digital communication became particularly evident during the COVID - 19 pandemic, which sparked an unprecedented volume of online discourse and was accompanied by w hat the World Health Organisation (WHO) termed an "infodemic" - an overabundance of information (both accurate and not) that makes it hard for people to find trustworthy guidance (WHO, 2020). This infodemic phenomenon presents a communication challenge and a substantive threat to public health. Research has shown that exposure to COVID - 19 misinformation can directly impact health behaviours. For example, exposure to false COVID - 19 vaccine information was associated with a reduction in vaccination intent by about 6.4 percentage points in the UK (and a similar 6.2 - point drop in the USA) (Chen et al., 2022; Loomba et al., 2021). Such an effect size is sufficient to undermine herd immunity thresholds.


All Models Are Wrong, But Can They Be Useful? Lessons from COVID-19 Agent-Based Models: A Systematic Review

Von Hoene, Emma, Von Hoene, Sara, Peter, Szandra, Hopson, Ethan, Csizmadia, Emily, Fenyk, Faith, Barner, Kai, Leslie, Timothy, Kavak, Hamdi, Zufle, Andreas, Roess, Amira, Anderson, Taylor

arXiv.org Artificial Intelligence

The COVID-19 pandemic prompted a surge in computational models to simulate disease dynamics and guide interventions. Agent-based models (ABMs) are well-suited to capture population and environmental heterogeneity, but their rapid deployment raised questions about utility for health policy. We systematically reviewed 536 COVID-19 ABM studies published from January 2020 to December 2023, retrieved from Web of Science, PubMed, and Wiley on January 30, 2024. Studies were included if they used ABMs to simulate COVID-19 transmission, where reviews were excluded. Studies were assessed against nine criteria of model usefulness, including transparency and re-use, interdisciplinary collaboration and stakeholder engagement, and evaluation practices. Publications peaked in late 2021 and were concentrated in a few countries. Most models explored behavioral or policy interventions (n = 294, 54.85%) rather than real-time forecasting (n = 9, 1.68%). While most described model assumptions (n = 491, 91.60%), fewer disclosed limitations (n = 349, 65.11%), shared code (n = 219, 40.86%), or built on existing models (n = 195, 36.38%). Standardized reporting protocols (n = 36, 6.72%) and stakeholder engagement were rare (13.62%, n = 73). Only 2.24% (n = 12) described a comprehensive validation framework, though uncertainty was often quantified (n = 407, 75.93%). Limitations of this review include underrepresentation of non-English studies, subjective data extraction, variability in study quality, and limited generalizability. Overall, COVID-19 ABMs advanced quickly, but lacked transparency, accessibility, and participatory engagement. Stronger standards are needed for ABMs to serve as reliable decision-support tools in future public health crises.


DemandLens: Enhancing Forecast Accuracy Through Product-Specific Hyperparameter Optimization

Pillai, Srijesh, Nazir, M. I. Jawid

arXiv.org Artificial Intelligence

DemandLens demonstrates an innovative Prophet based forecasting model for the mattress-in-a-box industry, incorporating COVID-19 metrics and SKU-specific hyperparameter optimization. This industry has seen significant growth of E-commerce players in the recent years, wherein the business model majorly relies on outsourcing Mattress manufacturing and related logistics and supply chain operations, focusing on marketing the product and driving conversions through Direct-to-Consumer sales channels. Now, within the United States, there are a limited number of Mattress contract manufacturers available, and hence, it is important that they manage their raw materials, supply chain, and, inventory intelligently, to be able to cater maximum Mattress brands. Our approach addresses the critical need for accurate Sales Forecasting in an industry that is heavily dependent on third-party Contract Manufacturing. This, in turn, helps the contract manufacturers to be prepared, hence, avoiding bottleneck scenarios, and aiding them to source raw materials at optimal rates. The model demonstrates strong predictive capabilities through SKU-specific Hyperparameter optimization, offering the Contract Manufacturers and Mattress brands a reliable tool to streamline supply chain operations.



Content and Engagement Trends in COVID-19 YouTube Videos: Evidence from the Late Pandemic

Thakur, Nirmalya, Hartel, Madeline D, Boden, Lane Michael, Enriquez, Dallas, Ricks, Boston Joyner

arXiv.org Artificial Intelligence

This work investigated about 10,000 COVID-19-related YouTube videos published between January 2023 and October 2024 to evaluate how temporal, lexical, linguistic, and structural factors influenced engagement during the late pandemic period. Publishing activity showed consistent weekday effects: in the first window, average views peaked on Mondays at 92,658; in the second, on Wednesdays at 115,479; and in the third, on Fridays at 84,874, reflecting a shift in audience attention toward mid- and late week. Lexical analysis of video titles revealed recurring high-frequency keywords related to COVID-19 and YouTube features, including COVID, coronavirus, shorts, and live. Frequency analysis revealed sharp spikes, with COVID appearing in 799 video titles in August 2024, while engagement analysis showed that videos titled with shorts attracted very high views, peaking at 2.16 million average views per video in June 2023. Analysis of sentiment of video descriptions in English showed weak correlation with views in the raw data (Pearson r = 0.0154, p = 0.2987), but stronger correlations emerged once outliers were addressed, with Spearman r = 0.110 (p < 0.001) and Pearson r = 0.0925 (p < 0.001). Category-level analysis of video durations revealed contrasting outcomes: long videos focusing on people and blogs averaged 209,114 views, short entertainment videos averaged 288,675 views, and medium-to-long news and politics videos averaged 51,309 and 59,226 views, respectively. These results demonstrate that engagement patterns of COVID-19-related videos on YouTube during the late pandemic followed distinct characteristics driven by publishing schedules, title vocabulary, topics, and genre-specific duration effects.