AITopics | validation study

Collaborating Authors

validation study

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A chart review process aided by natural language processing and multi-wave adaptive sampling to expedite validation of code-based algorithms for large database studies

Wang, Shirley V, Hahn, Georg, Sreedhara, Sushama Kattinakere, Mahesri, Mufaddal, Pillai, Haritha S., Aldis, Rajendra, Lii, Joyce, Dutcher, Sarah K., Eniafe, Rhoda, Jones, Jamal T., Kim, Keewan, He, Jiwei, Lee, Hana, Toh, Sengwee, Desai, Rishi J, Yang, Jie

arXiv.org Artificial IntelligenceAug-1-2025

Background: One of the ways to enhance analyses conducted with large claims databases is by validating the measurement characteristics of code-based algorithms used to identify health outcomes or other key study parameters of interest. These metrics can be used in quantitative bias analyses to assess the robustness of results for an inferential study given potential bias from outcome misclassification. However, extensive time and resource allocation are typically re-quired to create reference-standard labels through manual chart review of free-text notes from linked electronic health records. Methods: We describe an expedited process that introduces efficiency in a validation study us-ing two distinct mechanisms: 1) use of natural language processing (NLP) to reduce time spent by human reviewers to review each chart, and 2) a multi-wave adaptive sampling approach with pre-defined criteria to stop the validation study once performance characteristics are identified with sufficient precision. We illustrate this process in a case study that validates the performance of a claims-based outcome algorithm for intentional self-harm in patients with obesity. Results: We empirically demonstrate that the NLP-assisted annotation process reduced the time spent on review per chart by 40% and use of the pre-defined stopping rule with multi-wave samples would have prevented review of 77% of patient charts with limited compromise to precision in derived measurement characteristics. Conclusion: This approach could facilitate more routine validation of code-based algorithms used to define key study parameters, ultimately enhancing understanding of the reliability of find-ings derived from database studies.

intentional self, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.22943

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

Shall Your Data Strategy Work? Perform a Swift Study

Peng, Minlong, Yang, Jingyi, He, Zhongjun, Wu, Hua

arXiv.org Artificial IntelligenceFeb-19-2025

This work presents a swift method to assess the efficacy of particular types of instruction-tuning data, utilizing just a handful of probe examples and eliminating the need for model retraining. This method employs the idea of gradient-based data influence estimation, analyzing the gradient projections of probe examples from the chosen strategy onto evaluation examples to assess its advantages. Building upon this method, we conducted three swift studies to investigate the potential of Chain-of-thought (CoT) data, query clarification data, and response evaluation data in enhancing model generalization. Subsequently, we embarked on a validation study to corroborate the findings of these swift studies. In this validation study, we developed training datasets tailored to each studied strategy and compared model performance with and without the use of these datasets. The results of the validation study aligned with the findings of the swift studies, validating the efficacy of our proposed method.

evaluation data, query clarification data, training data, (15 more...)

arXiv.org Artificial Intelligence

2502.13514

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A causal viewpoint on prediction model performance under changes in case-mix: discrimination and calibration respond differently for prognosis and diagnosis predictions

van Amsterdam, Wouter A. C.

arXiv.org Artificial IntelligenceSep-5-2024

Prediction models inform important clinical decisions, aiding in diagnosis, prognosis, and treatment planning. The predictive performance of these models is typically assessed through discrimination and calibration. However, changes in the distribution of the data impact model performance. In health-care, a typical change is a shift in case-mix: for example, for cardiovascular risk management, a general practitioner sees a different mix of patients than a specialist in a tertiary hospital. This work introduces a novel framework that differentiates the effects of case-mix shifts on discrimination and calibration based on the causal direction of the prediction task. When prediction is in the causal direction (often the case for prognosis predictions), calibration remains stable under case-mix shifts, while discrimination does not. Conversely, when predicting in the anti-causal direction (often with diagnosis predictions), discrimination remains stable, but calibration does not. A simulation study and empirical validation using cardiovascular disease prediction models demonstrate the implications of this framework. This framework provides critical insights for evaluating and deploying prediction models across different clinical settings, emphasizing the importance of understanding the causal structure of the prediction task.

calibration, prediction, prediction model, (15 more...)

arXiv.org Artificial Intelligence

2409.01444

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved - Journal of Clinical Epidemiology

#artificialintelligenceMar-17-2023, 18:36:05 GMT

Evaluate the completeness of reporting of prognostic prediction models developed using machine learning methods in the field of oncology. We conducted a systematic review, searching the MEDLINE and Embase databases between 01/01/2019 and 05/09/2019, for non-imaging studies developing a prognostic clinical prediction model using machine learning methods (as defined by primary study authors) in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement to assess the reporting quality of included publications. We described overall reporting adherence of included publications and by each section of TRIPOD. Sixty-two publications met the inclusion criteria.

oncology, prediction model, prognostic clinical prediction model, (6 more...)

#artificialintelligence

Genre: Research Report (0.43)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.94)
Health & Medicine > Epidemiology (0.85)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adversarial Scrutiny of Evidentiary Statistical Software

Abebe, Rediet, Hardt, Moritz, Jin, Angela, Miller, John, Schmidt, Ludwig, Wexler, Rebecca

arXiv.org Artificial IntelligenceSep-30-2022

The U.S. criminal legal system increasingly relies on software output to convict and incarcerate people. In a large number of cases each year, the government makes these consequential decisions based on evidence from statistical software -- such as probabilistic genotyping, environmental audio detection, and toolmark analysis tools -- that defense counsel cannot fully cross-examine or scrutinize. This undermines the commitments of the adversarial criminal legal system, which relies on the defense's ability to probe and test the prosecution's case to safeguard individual rights. Responding to this need to adversarially scrutinize output from such software, we propose robust adversarial testing as an audit framework to examine the validity of evidentiary statistical software. We define and operationalize this notion of robust adversarial testing for defense use by drawing on a large body of recent work in robust machine learning and algorithmic fairness. We demonstrate how this framework both standardizes the process for scrutinizing such tools and empowers defense lawyers to examine their validity for instances most relevant to the case at hand. We further discuss existing structural and institutional challenges within the U.S. criminal legal system that may create barriers for implementing this and other such audit frameworks and close with a discussion on policy changes that could help address these concerns.

artificial intelligence, machine learning, software, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3531146.3533228

2206.09305

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Oklahoma (0.04)
North America > United States > Virginia (0.04)
(11 more...)

Genre: Research Report (0.64)

Industry:

Law > Litigation (1.00)
Law > Criminal Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved

#artificialintelligenceJun-29-2021

prediction model, prognostic clinical prediction model, reporting, (5 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Oncology (0.94)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems

Mahmood, Usman, Shrestha, Robik, Bates, David D. B., Mannelli, Lorenzo, Corrias, Giuseppe, Erdi, Yusuf, Kanan, Christopher

arXiv.org Machine LearningMar-4-2021

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.

ai system, dataset, sanity test, (14 more...)

arXiv.org Machine Learning

2103.03048

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(14 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measurement Error in Nutritional Epidemiology: A Survey

Peng, Huimin

arXiv.org Machine LearningApr-14-2020

This article reviews bias-correction models for measurement error of exposure variables in the field of nutritional epidemiology. Measurement error usually attenuates estimated slope towards zero. Due to the influence of measurement error, inference of parameter estimate is conservative and confidence interval of the slope parameter is too narrow. Bias-correction in estimators and confidence intervals are of primary interest. We review the following bias-correction models: regression calibration methods, likelihood based models, missing data models, simulation based methods, nonparametric models and sampling based procedures.

estimator, exposure, measurement error, (12 more...)

arXiv.org Machine Learning

2004.06448

Country:

North America > United States (0.04)
North America > Greenland (0.04)
Europe > Netherlands (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

What Can We Expect Following Anterior Total Hip Arthroplasty on a Regular Operating Table? A Validation Study of an Artificial Intelligence Algorithm to Monitor Adverse Events in a High-Volume, Nonacademic Setting

#artificialintelligenceOct-26-2019, 01:03:47 GMT

Quality monitoring is increasingly important to support and assure sustainability of the orthopedic practice. Surgeons in nonacademic settings often lack resources to accurately monitor quality of care. Widespread use of electronic medical records (EMR) provides easier access to medical information, facilitating its analysis. However, manual review of EMRs is highly inefficient. Artificial intelligence (AI) software allows for the development of algorithms for extracting relevant complications from EMRs.

anterior total hip arthroplasty, artificial intelligence algorithm, regular operating table, (6 more...)

#artificialintelligence

Industry: Health & Medicine > Health Care Technology > Medical Record (0.74)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Semiparametric Methods for Exposure Misclassification in Propensity Score-Based Time-to-Event Data Analysis

Yang, Yingrui, Wang, Molin

arXiv.org Machine LearningMar-18-2019

In epidemiology, identifying the effect of exposure variables in relation to a time-to-event outcome is a classical research area of practical importance. Incorporating propensity score in the Cox regression model, as a measure to control for confounding, has certain advantages when outcome is rare. However, in situations involving exposure measured with moderate to substantial error, identifying the exposure effect using propensity score in Cox models remains a challenging yet unresolved problem. In this paper, we propose an estimating equation method to correct for the exposure misclassification-caused bias in the estimation of exposure-outcome associations. We also discuss the asymptotic properties and derive the asymptotic variances of the proposed estimators. We conduct a simulation study to evaluate the performance of the proposed estimators in various settings. As an illustration, we apply our method to correct for the misclassification-caused bias in estimating the association of PM2.5 level with lung cancer mortality using a nationwide prospective cohort, the Nurses' Health Study (NHS). The proposed methodology can be applied using our user-friendly R function published online.

artificial intelligence, machine learning, validation study, (15 more...)

arXiv.org Machine Learning

1903.07782

Country: Europe > United Kingdom (0.48)

Genre:

Research Report > Strength Medium (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.89)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Add feedback