AITopics | Marshfield

Collaborating Authors

Marshfield

A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary Classification: Application in Pancreatic Cancer Nested Case-control Studies with Implications for Bias Assessments

Urbanowicz, Ryan J., Suri, Pranshu, Cui, Yuhan, Moore, Jason H., Ruth, Karen, Stolzenberg-Solomon, Rachael, Lynch, Shannon M.

arXiv.org Machine LearningSep-8-2020

Machine learning (ML) offers a collection of powerful approaches for detecting and modeling associations, often applied to data having a large number of features and/or complex associations. Currently, there are many tools to facilitate implementing custom ML analyses (e.g. scikit-learn). Interest is also increasing in automated ML packages, which can make it easier for non-experts to apply ML and have the potential to improve model performance. ML permeates most subfields of biomedical research with varying levels of rigor and correct usage. Tremendous opportunities offered by ML are frequently offset by the challenge of assembling comprehensive analysis pipelines, and the ease of ML misuse. In this work we have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification (i.e. case/control prediction), and applied this pipeline to both simulated and real world data. At a high level, this 'automated' but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms, each with hyperparameter optimization, and e) thorough evaluation, including appropriate metrics, statistical analyses, and novel visualizations. This pipeline organizes the many subtle complexities of ML pipeline assembly to illustrate best practices to avoid bias and ensure reproducibility. Additionally, this pipeline is the first to compare established ML algorithms to 'ExSTraCS', a rule-based ML algorithm with the unique capability of interpretably modeling heterogeneous patterns of association. While designed to be widely applicable we apply this pipeline to an epidemiological investigation of established and newly identified risk factors for pancreatic cancer to evaluate how different sources of bias might be handled by ML algorithms.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2008.12829

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Wisconsin > Wood County > Marshfield (0.04)
(10 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Education > Health & Safety > School Nutrition (1.00)
Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records

Weiss, Jeremy C. (University of Wisconsin-Madison) | Natarajan, Sriraam (Wake Forest University) | Peissig, Peggy L. (Marshfield Clinic Research Foundation) | McCarty, Catherine A. (Essentia Institute of Rural Health) | Page, Daivd (University of Wisconsin-Madison)

AAAI ConferencesJul-21-2012

Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices.

algorithm, gradient, risk factor, (16 more...)

AAAI Conferences

Twenty-Fourth IAAI Conference

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > Greenland (0.04)
North America > United States > Wisconsin > Wood County > Marshfield (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback