AITopics | epidemiology

Collaborating Authors

epidemiology

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New Head of Trump's Cancer Panel Questioned Links Between Vaccines and Cancer

WIREDDec-16-2025, 17:17:07 GMT

Yale epidemiologist Harvey Risch, who has entertained a connection between Covid vaccines and "turbo cancer" and promoted ivermectin, says he'll chair the President's Cancer Panel. An epidemiologist who has speculated about whether there is a connection between Covid-19 vaccines and "turbo cancer" in young people, and works as chief epidemiologist at a company that sells ivermectin alongside reviews that claim it has efficacy as a cancer treatment, has been appointed by president Donald Trump to a key position overseeing the National Cancer Program. Harvey Risch, a professor emeritus of epidemiology at the Yale School of Public Health, announced his appointment as chair of the President's Cancer Panel on X earlier this month. Risch's profile page on the Yale website has also been updated to read "In November 2025, President Trump appointed Dr. Risch to Chair the President's Cancer panel." No formal announcement was made by the president or the White House, and the Cancer Panel website's list of current members does not include Risch.

cancer, cancer panel, risch, (16 more...)

WIRED

Country:

North America > United States > California (0.15)
North America > United States > Connecticut (0.05)
Europe > Slovakia (0.05)
Europe > Czechia (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence (0.71)

Add feedback

Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI

Kapar, Jan, Günther, Kathrin, Vallis, Lori Ann, Berger, Klaus, Binder, Nadine, Brenner, Hermann, Castell, Stefanie, Fischer, Beate, Harth, Volker, Holleczek, Bernd, Intemann, Timm, Ittermann, Till, Karch, André, Keil, Thomas, Krist, Lilian, Lange, Berit, Leitzmann, Michael F., Nimptsch, Katharina, Obi, Nadia, Pigeot, Iris, Pischon, Tobias, Schikowski, Tamara, Schmidt, Börge, Schmidt, Carsten Oliver, Sedlmair, Anja M., Tanoey, Justine, Wienbergen, Harm, Wienke, Andreas, Wigmann, Claudia, Wright, Marvin N.

arXiv.org Machine LearningAug-22-2025

Generative artificial intelligence for synthetic data generation holds substantial potential to address practical challenges in epidemiology. However, many current methods suffer from limited quality, high computational demands, and complexity for non-experts. Furthermore, common evaluation strategies for synthetic data often fail to directly reflect statistical utility. Against this background, a critical underexplored question is whether synthetic data can reliably reproduce key findings from epidemiological research. We propose the use of adversarial random forests (ARF) as an efficient and convenient method for synthesizing tabular epidemiological data. To evaluate its performance, we replicated statistical analyses from six epidemiological publications and compared original with synthetic results. These publications cover blood pressure, anthropometry, myocardial infarction, accelerometry, loneliness, and diabetes, based on data from the German National Cohort (NAKO Gesundheitsstudie), the Bremen STEMI Registry U45 Study, and the Guelph Family Health Study. Additionally, we assessed the impact of dimensionality and variable complexity on synthesis quality by limiting datasets to variables relevant for individual analyses, including necessary derivations. Across all replicated original studies, results from multiple synthetic data replications consistently aligned with original findings. Even for datasets with relatively low sample size-to-dimensionality ratios, the replication outcomes closely matched the original results across various descriptive and inferential analyses. Reducing dimensionality and pre-deriving variables further enhanced both quality and stability of the results.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2508.14936

Country:

Europe > Germany > Bremen > Bremen (0.14)
Europe > Germany > Bavaria > Regensburg (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength Medium (0.67)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Why can't Epidemiology be automated (yet)?

Bann, David, Lowther, Ed, Wright, Liam, Kovalchuk, Yevgeniya

arXiv.org Artificial IntelligenceJul-22-2025

Recent advances in artificial intelligence (AI) - particularly generative AI - present new opportunities to accelerate, or even automate, epidemiological research. Unlike disciplines based on physical experimentation, a sizable fraction of Epidemiology relies on secondary data analysis and thus is well-suited for such augmentation. Yet, it remains unclear which specific tasks can benefit from AI interventions or where roadblocks exist. Awareness of current AI capabilities is also mixed. Here, we map the landscape of epidemiological tasks using existing datasets - from literature review to data access, analysis, writing up, and dissemination - and identify where existing AI tools offer efficiency gains. While AI can increase productivity in some areas such as coding and administrative tasks, its utility is constrained by limitations of existing AI models (e.g. hallucinations in literature reviews) and human systems (e.g. barriers to accessing datasets). Through examples of AI-generated epidemiological outputs, including fully AI-generated papers, we demonstrate that recently developed agentic systems can now design and execute epidemiological analysis, albeit to varied quality (see https://github.com/edlowther/automated-epidemiology). Epidemiologists have new opportunities to empirically test and benchmark AI systems; realising the potential of AI will require two-way engagement between epidemiologists and engineers.

epidemiology, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.15617

Country: Europe > United Kingdom > England (0.28)

Genre:

Research Report > Strength Medium (0.68)
Research Report > Observational Study (0.68)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Performance of Cross-Validated Targeted Maximum Likelihood Estimation

Smith, Matthew J., Phillips, Rachael V., Maringe, Camille, Luque-Fernandez, Miguel Angel

arXiv.org Machine LearningSep-18-2024

Background: Advanced methods for causal inference, such as targeted maximum likelihood estimation (TMLE), require certain conditions for statistical inference. However, in situations where there is not differentiability due to data sparsity or near-positivity violations, the Donsker class condition is violated. In such situations, TMLE variance can suffer from inflation of the type I error and poor coverage, leading to conservative confidence intervals. Cross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve on performance compared to TMLE in settings of positivity or Donsker class violations. We aim to investigate the performance of CVTMLE compared to TMLE in various settings. Methods: We utilised the data-generating mechanism as described in Leger et al. (2022) to run a Monte Carlo experiment under different Donsker class violations. Then, we evaluated the respective statistical performances of TMLE and CVTMLE with different super learner libraries, with and without regression tree methods. Results: We found that CVTMLE vastly improves confidence interval coverage without adversely affecting bias, particularly in settings with small sample sizes and near-positivity violations. Furthermore, incorporating regression trees using standard TMLE with ensemble super learner-based initial estimates increases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class condition is no longer necessary to obtain valid statistical inference when using regression trees and under either data sparsity or near-positivity violations. We show through simulations that CVTMLE is much less sensitive to the choice of the super learner library and thereby provides better estimation and inference in cases where the super learner library uses more flexible candidates and is prone to overfitting.

donsker class condition, prevalence, sample size, (13 more...)

arXiv.org Machine Learning

2409.11265

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates

Weberpals, Janick, Shaw, Pamela A., Lin, Kueiyu Joshua, Wyss, Richard, Plasek, Joseph M, Zhou, Li, Ngan, Kerry, DeRamus, Thomas, Raman, Sudha R., Hammill, Bradley G., Lee, Hana, Toh, Sengwee, Connolly, John G., Dandreo, Kimberly J., Tian, Fang, Liu, Wei, Li, Jie, Hernández-Muñoz, José J., Schneeweiss, Sebastian, Desai, Rishi J.

arXiv.org Artificial IntelligenceMay-17-2024

Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors.

cohort, covariate, imputation, (15 more...)

arXiv.org Artificial Intelligence

2405.10925

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
North America > United States > Maryland > Montgomery County > Silver Spring (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.94)
Health & Medicine > Therapeutic Area > Immunology (0.89)
Government > Regional Government > North America Government > United States Government > FDA (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Review of Graph Neural Networks in Epidemic Modeling

Liu, Zewen, Wan, Guancheng, Prakash, B. Aditya, Lau, Max S. Y., Jin, Wei

arXiv.org Artificial IntelligenceApr-21-2024

Since the onset of the COVID-19 pandemic, there has been a growing interest in studying epidemiological models. Traditional mechanistic models mathematically describe the transmission mechanisms of infectious diseases. However, they often suffer from limitations of oversimplified or fixed assumptions, which could cause sub-optimal predictive power and inefficiency in capturing complex relation information. Consequently, Graph Neural Networks (GNNs) have emerged as a progressively popular tool in epidemic research. In this paper, we endeavor to furnish a comprehensive review of GNNs in epidemic tasks and highlight potential future directions. To accomplish this objective, we introduce hierarchical taxonomies for both epidemic tasks and methodologies, offering a trajectory of development within this domain. For epidemic tasks, we establish a taxonomy akin to those typically employed within the epidemic domain. For methodology, we categorize existing work into Neural Models and Hybrid Models. Following this, we perform an exhaustive and systematic examination of the methodologies, encompassing both the tasks and their technical details. Furthermore, we discuss the limitations of existing methods from diverse perspectives and systematically propose future research directions. This survey aims to bridge literature gaps and promote the progression of this promising field, with a list of relevant papers at https://github.com/Emory-Melody/awesome-epidemic-modelingpapers. We hope that it will facilitate synergies between the communities of GNNs and epidemiology, and contribute to their collective progress.

graph neural network, international conference, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2403.19852

Country:

North America > United States > District of Columbia > Washington (0.05)
Oceania > New Zealand (0.04)
South America > Brazil (0.04)
(6 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Creating a Discipline-specific Commons for Infectious Disease Epidemiology

Wagner, Michael M., Hogan, William, Levander, John, Darr, Adam, Diller, Matt, Sibilla, Max, Sperringer,, Alexander T. Loiacono. Terence Jr., Brown, Shawn T.

arXiv.org Artificial IntelligenceNov-12-2023

Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.

commons, dataset, software, (17 more...)

arXiv.org Artificial Intelligence

2311.06989

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
South America (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Public Health (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Data Science (0.95)

Add feedback

BAND: Biomedical Alert News Dataset

Fu, Zihao, Zhang, Meiru, Meng, Zaiqiao, Shen, Yannan, Buckeridge, David, Collier, Nigel

arXiv.org Artificial IntelligenceOct-15-2023

Infectious disease outbreaks continue to pose a significant threat to human health and well-being. To improve disease surveillance and understanding of disease spread, several surveillance systems have been developed to monitor daily news alerts and social media. However, existing systems lack thorough epidemiological analysis in relation to corresponding alerts or news, largely due to the scarcity of well-annotated reports data. To address this gap, we introduce the Biomedical Alert News Dataset (BAND), which includes 1,508 samples from existing reported news articles, open emails, and alerts, as well as 30 epidemiology-related questions. These questions necessitate the model's expert reasoning abilities, thereby offering valuable insights into the outbreak of the disease. The BAND dataset brings new challenges to the NLP world, requiring better disguise capability of the content and the ability to infer important information. We provide several benchmark tasks, including Named Entity Recognition (NER), Question Answering (QA), and Event Extraction (EE), to show how existing models are capable of handling these tasks in the epidemiology domain. To the best of our knowledge, the BAND corpus is the largest corpus of well-annotated biomedical outbreak alert news with elaborately designed questions, making it a valuable resource for epidemiologists and NLP researchers alike.

infer, outbreak, victim, (16 more...)

arXiv.org Artificial Intelligence

2305.1448

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.05)
North America > United States > Ohio (0.04)
Europe > Russia (0.04)
(40 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Personalised dynamic super learning: an application in predicting hemodiafiltration's convection volumes

Chatton, Arthur, Bally, Michèle, Lévesque, Renée, Malenica, Ivana, Platt, Robert W., Schnitzer, Mireille E.

arXiv.org Machine LearningOct-12-2023

Obtaining continuously updated predictions is a major challenge for personalised medicine. Leveraging combinations of parametric regressions and machine learning approaches, the personalised online super learner (POSL) can achieve such dynamic and personalised predictions. We adapt POSL to predict a repeated continuous outcome dynamically and propose a new way to validate such personalised or dynamic prediction models. We illustrate its performance by predicting the convection volume of patients undergoing hemodiafiltration. POSL outperformed its candidate learners with respect to median absolute error, calibration-in-the-large, discrimination, and net benefit. We finally discuss the choices and challenges underlying the use of POSL.

artificial intelligence, learner, machine learning, (19 more...)

arXiv.org Machine Learning

2310.08479

Country: