epidemiology
- North America > Canada > Quebec > Montreal (0.05)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
New Head of Trump's Cancer Panel Questioned Links Between Vaccines and Cancer
Yale epidemiologist Harvey Risch, who has entertained a connection between Covid vaccines and "turbo cancer" and promoted ivermectin, says he'll chair the President's Cancer Panel. An epidemiologist who has speculated about whether there is a connection between Covid-19 vaccines and "turbo cancer" in young people, and works as chief epidemiologist at a company that sells ivermectin alongside reviews that claim it has efficacy as a cancer treatment, has been appointed by president Donald Trump to a key position overseeing the National Cancer Program. Harvey Risch, a professor emeritus of epidemiology at the Yale School of Public Health, announced his appointment as chair of the President's Cancer Panel on X earlier this month. Risch's profile page on the Yale website has also been updated to read "In November 2025, President Trump appointed Dr. Risch to Chair the President's Cancer panel." No formal announcement was made by the president or the White House, and the Cancer Panel website's list of current members does not include Risch.
- North America > United States > California (0.15)
- North America > United States > Connecticut (0.05)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI
Kapar, Jan, Günther, Kathrin, Vallis, Lori Ann, Berger, Klaus, Binder, Nadine, Brenner, Hermann, Castell, Stefanie, Fischer, Beate, Harth, Volker, Holleczek, Bernd, Intemann, Timm, Ittermann, Till, Karch, André, Keil, Thomas, Krist, Lilian, Lange, Berit, Leitzmann, Michael F., Nimptsch, Katharina, Obi, Nadia, Pigeot, Iris, Pischon, Tobias, Schikowski, Tamara, Schmidt, Börge, Schmidt, Carsten Oliver, Sedlmair, Anja M., Tanoey, Justine, Wienbergen, Harm, Wienke, Andreas, Wigmann, Claudia, Wright, Marvin N.
Generative artificial intelligence for synthetic data generation holds substantial potential to address practical challenges in epidemiology. However, many current methods suffer from limited quality, high computational demands, and complexity for non-experts. Furthermore, common evaluation strategies for synthetic data often fail to directly reflect statistical utility. Against this background, a critical underexplored question is whether synthetic data can reliably reproduce key findings from epidemiological research. We propose the use of adversarial random forests (ARF) as an efficient and convenient method for synthesizing tabular epidemiological data. To evaluate its performance, we replicated statistical analyses from six epidemiological publications and compared original with synthetic results. These publications cover blood pressure, anthropometry, myocardial infarction, accelerometry, loneliness, and diabetes, based on data from the German National Cohort (NAKO Gesundheitsstudie), the Bremen STEMI Registry U45 Study, and the Guelph Family Health Study. Additionally, we assessed the impact of dimensionality and variable complexity on synthesis quality by limiting datasets to variables relevant for individual analyses, including necessary derivations. Across all replicated original studies, results from multiple synthetic data replications consistently aligned with original findings. Even for datasets with relatively low sample size-to-dimensionality ratios, the replication outcomes closely matched the original results across various descriptive and inferential analyses. Reducing dimensionality and pre-deriving variables further enhanced both quality and stability of the results.
- Europe > Germany > Bremen > Bremen (0.14)
- Europe > Germany > Bavaria > Regensburg (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > Strength Medium (0.67)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Epidemiology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.92)
- (2 more...)
Why can't Epidemiology be automated (yet)?
Bann, David, Lowther, Ed, Wright, Liam, Kovalchuk, Yevgeniya
Recent advances in artificial intelligence (AI) - particularly generative AI - present new opportunities to accelerate, or even automate, epidemiological research. Unlike disciplines based on physical experimentation, a sizable fraction of Epidemiology relies on secondary data analysis and thus is well-suited for such augmentation. Yet, it remains unclear which specific tasks can benefit from AI interventions or where roadblocks exist. Awareness of current AI capabilities is also mixed. Here, we map the landscape of epidemiological tasks using existing datasets - from literature review to data access, analysis, writing up, and dissemination - and identify where existing AI tools offer efficiency gains. While AI can increase productivity in some areas such as coding and administrative tasks, its utility is constrained by limitations of existing AI models (e.g. hallucinations in literature reviews) and human systems (e.g. barriers to accessing datasets). Through examples of AI-generated epidemiological outputs, including fully AI-generated papers, we demonstrate that recently developed agentic systems can now design and execute epidemiological analysis, albeit to varied quality (see https://github.com/edlowther/automated-epidemiology). Epidemiologists have new opportunities to empirically test and benchmark AI systems; realising the potential of AI will require two-way engagement between epidemiologists and engineers.
- North America > United States (0.14)
- Europe > United Kingdom > England > Hertfordshire (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Research Report > Strength Medium (0.68)
- Research Report > Observational Study (0.68)
- Research Report > Experimental Study (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Performance of Cross-Validated Targeted Maximum Likelihood Estimation
Smith, Matthew J., Phillips, Rachael V., Maringe, Camille, Luque-Fernandez, Miguel Angel
Background: Advanced methods for causal inference, such as targeted maximum likelihood estimation (TMLE), require certain conditions for statistical inference. However, in situations where there is not differentiability due to data sparsity or near-positivity violations, the Donsker class condition is violated. In such situations, TMLE variance can suffer from inflation of the type I error and poor coverage, leading to conservative confidence intervals. Cross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve on performance compared to TMLE in settings of positivity or Donsker class violations. We aim to investigate the performance of CVTMLE compared to TMLE in various settings. Methods: We utilised the data-generating mechanism as described in Leger et al. (2022) to run a Monte Carlo experiment under different Donsker class violations. Then, we evaluated the respective statistical performances of TMLE and CVTMLE with different super learner libraries, with and without regression tree methods. Results: We found that CVTMLE vastly improves confidence interval coverage without adversely affecting bias, particularly in settings with small sample sizes and near-positivity violations. Furthermore, incorporating regression trees using standard TMLE with ensemble super learner-based initial estimates increases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class condition is no longer necessary to obtain valid statistical inference when using regression trees and under either data sparsity or near-positivity violations. We show through simulations that CVTMLE is much less sensitive to the choice of the super learner library and thereby provides better estimation and inference in cases where the super learner library uses more flexible candidates and is prone to overfitting.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates
Weberpals, Janick, Shaw, Pamela A., Lin, Kueiyu Joshua, Wyss, Richard, Plasek, Joseph M, Zhou, Li, Ngan, Kerry, DeRamus, Thomas, Raman, Sudha R., Hammill, Bradley G., Lee, Hana, Toh, Sengwee, Connolly, John G., Dandreo, Kimberly J., Tian, Fang, Liu, Wei, Li, Jie, Hernández-Muñoz, José J., Schneeweiss, Sebastian, Desai, Rishi J.
Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors.
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- North America > United States > Maryland > Montgomery County > Silver Spring (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
A Review of Graph Neural Networks in Epidemic Modeling
Liu, Zewen, Wan, Guancheng, Prakash, B. Aditya, Lau, Max S. Y., Jin, Wei
Since the onset of the COVID-19 pandemic, there has been a growing interest in studying epidemiological models. Traditional mechanistic models mathematically describe the transmission mechanisms of infectious diseases. However, they often suffer from limitations of oversimplified or fixed assumptions, which could cause sub-optimal predictive power and inefficiency in capturing complex relation information. Consequently, Graph Neural Networks (GNNs) have emerged as a progressively popular tool in epidemic research. In this paper, we endeavor to furnish a comprehensive review of GNNs in epidemic tasks and highlight potential future directions. To accomplish this objective, we introduce hierarchical taxonomies for both epidemic tasks and methodologies, offering a trajectory of development within this domain. For epidemic tasks, we establish a taxonomy akin to those typically employed within the epidemic domain. For methodology, we categorize existing work into Neural Models and Hybrid Models. Following this, we perform an exhaustive and systematic examination of the methodologies, encompassing both the tasks and their technical details. Furthermore, we discuss the limitations of existing methods from diverse perspectives and systematically propose future research directions. This survey aims to bridge literature gaps and promote the progression of this promising field, with a list of relevant papers at https://github.com/Emory-Melody/awesome-epidemic-modelingpapers. We hope that it will facilitate synergies between the communities of GNNs and epidemiology, and contribute to their collective progress.
- North America > United States > District of Columbia > Washington (0.05)
- Oceania > New Zealand (0.04)
- South America > Brazil (0.04)
- (6 more...)
- Research Report (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Creating a Discipline-specific Commons for Infectious Disease Epidemiology
Wagner, Michael M., Hogan, William, Levander, John, Darr, Adam, Diller, Matt, Sibilla, Max, Sperringer,, Alexander T. Loiacono. Terence Jr., Brown, Shawn T.
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.
- North America > United States > Florida > Hillsborough County > University (0.04)
- South America (0.04)
- North America > United States > Pennsylvania (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Public Health (1.00)
- Health & Medicine > Epidemiology (1.00)
BAND: Biomedical Alert News Dataset
Fu, Zihao, Zhang, Meiru, Meng, Zaiqiao, Shen, Yannan, Buckeridge, David, Collier, Nigel
Infectious disease outbreaks continue to pose a significant threat to human health and well-being. To improve disease surveillance and understanding of disease spread, several surveillance systems have been developed to monitor daily news alerts and social media. However, existing systems lack thorough epidemiological analysis in relation to corresponding alerts or news, largely due to the scarcity of well-annotated reports data. To address this gap, we introduce the Biomedical Alert News Dataset (BAND), which includes 1,508 samples from existing reported news articles, open emails, and alerts, as well as 30 epidemiology-related questions. These questions necessitate the model's expert reasoning abilities, thereby offering valuable insights into the outbreak of the disease. The BAND dataset brings new challenges to the NLP world, requiring better disguise capability of the content and the ability to infer important information. We provide several benchmark tasks, including Named Entity Recognition (NER), Question Answering (QA), and Event Extraction (EE), to show how existing models are capable of handling these tasks in the epidemiology domain. To the best of our knowledge, the BAND corpus is the largest corpus of well-annotated biomedical outbreak alert news with elaborately designed questions, making it a valuable resource for epidemiologists and NLP researchers alike.
- North America > United States > Nevada > Clark County > Las Vegas (0.05)
- North America > United States > Ohio (0.04)
- Europe > Russia (0.04)
- (40 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
Personalised dynamic super learning: an application in predicting hemodiafiltration's convection volumes
Chatton, Arthur, Bally, Michèle, Lévesque, Renée, Malenica, Ivana, Platt, Robert W., Schnitzer, Mireille E.
Obtaining continuously updated predictions is a major challenge for personalised medicine. Leveraging combinations of parametric regressions and machine learning approaches, the personalised online super learner (POSL) can achieve such dynamic and personalised predictions. We adapt POSL to predict a repeated continuous outcome dynamically and propose a new way to validate such personalised or dynamic prediction models. We illustrate its performance by predicting the convection volume of patients undergoing hemodiafiltration. POSL outperformed its candidate learners with respect to median absolute error, calibration-in-the-large, discrimination, and net benefit. We finally discuss the choices and challenges underlying the use of POSL.
- North America > Canada > Quebec > Montreal (0.15)
- North America > United States > New York > New York County > New York City (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Nephrology (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Epidemiology (0.95)
- (2 more...)