concordance
Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment
Kumar, Sayantan, Noroozizadeh, Shahriar, Kim, Juyong, Weiss, Jeremy C.
Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and contextually complete descriptions of a patient's course, they often lack temporal precision and contain ambiguous event timing. Conversely, structured electronic health record (EHR) data provides precise temporal anchors but misses a substantial portion of clinically meaningful events. We introduce a retrieval-augmented multimodal alignment framework that bridges this gap to improve the temporal precision of absolute clinical timelines extracted from text. Our approach formulates timeline reconstruction as a graph-based multistep process: it first extracts central anchor events from narratives to build an initial temporal scaffold, places non-central events relative to this backbone, and then calibrates the timeline using retrieved structured EHR rows as external temporal evidence. Evaluated using instruction-tuned large language models on the i2m4 benchmark spanning MIMIC-III and MIMIC-IV, our multimodal pipeline consistently improves absolute timestamp accuracy (AULTC) and improves temporal concordance across nearly all evaluated models over unimodal text-only reconstruction, without compromising event match rates. Furthermore, our empirical gap analysis reveals that 34.8% of text-derived events are entirely absent from tabular records, demonstrating that aligning these modalities can produce a more temporally faithful and clinically informative reconstruction of patient trajectories than either source alone.
Assumptions and Likelihoods in More Detail
A.1 Notation Let T be a failure time with CDFF. T's survival function is defined by F = 1 F. We denote failure models by FฮธT. Let C be a censoring time with CDFG, survival function G, and model GฮธC. Under right-censoring, define U = min(T,C), = 1 [T C] and we observe (Xi,Ui, i). We use G(t) to denote P(C t).
Data reuse enables cost-efficient randomized trials of medical AI models
Nercessian, Michael, Zhang, Wenxin, Schubert, Alexander, Yang, Daphne, Chung, Maggie, Alaa, Ahmed, Yala, Adam
Joint Senior Corresponding Author: Michael Nercessian Email: michael.nercessian@berkeley.edu Abstract Randomized controlled trials (RCTs) are indispensable for establishing the clinical value of medical artificial-intelligence (AI) tools, yet their high cost and long timelines hinder timely validation as new models emerge rapidly. Here, we propose BRIDGE, a data-reuse RCT design for AI-based risk models. AI risk models support a broad range of interventions, including screening, treatment selection, and clinical alerts. BRIDGE trials recycle participant-level data from completed trials of AI models when legacy and updated models make concordant predictions, thereby reducing the enrollment requirement for subsequent trials. We provide a practical checklist for investigators to assess whether reusing data from previous trials allows for valid causal inference and preserves type I error. Using real-world datasets across breast cancer, cardiovascular disease, and sepsis, we demonstrate concordance between successive AI models, with up to 64.8% overlap in top 5% high-risk cohorts. We then simulate a series of breast cancer screening studies, where our design reduced required enrollment by 46.6%--saving over US$2.8 million--while maintaining 80% power. By transforming trials into adaptive, modular studies, our proposed design makes Level I evidence generation feasible for every model iteration, thereby accelerating cost-effective translation of AI into routine care . Introduction Artificial intelligence (AI) models have the potential to transform patient care by identifying high-risk individuals using high-dimensional data--such as imaging, electronic health records, or time-series data--to personalize screening, prevention, and treatment decisions across a range of diseases, including cancer and heart disease.
Interaction Concordance Index: Performance Evaluation for Interaction Prediction Methods
Pahikkala, Tapio, Numminen, Riikka, Movahedi, Parisa, Karmitsa, Napsu, Airola, Antti
Consider two sets of entities and their members' mutual affinity values, say drug-target affinities (DTA). Drugs and targets are said to interact in their effects on DTAs if drug's effect on it depends on the target. Presence of interaction implies that assigning a drug to a target and another drug to another target does not provide the same aggregate DTA as the reversed assignment would provide. Accordingly, correctly capturing interactions enables better decision-making, for example, in allocation of limited numbers of drug doses to their best matching targets. Learning to predict DTAs is popularly done from either solely from known DTAs or together with side information on the entities, such as chemical structures of drugs and targets. In this paper, we introduce interaction directions' prediction performance estimator we call interaction concordance index (IC-index), for both fixed predictors and machine learning algorithms aimed for inferring them. IC-index complements the popularly used DTA prediction performance estimators by evaluating the ratio of correctly predicted directions of interaction effects in data. First, we show the invariance of IC-index on predictors unable to capture interactions. Secondly, we show that learning algorithm's permutation equivariance regarding drug and target identities implies its inability to capture interactions when either drug, target or both are unseen during training. In practical applications, this equivariance is remedied via incorporation of appropriate side information on drugs and targets. We make a comprehensive empirical evaluation over several biomedical interaction data sets with various state-of-the-art machine learning algorithms. The experiments demonstrate how different types of affinity strength prediction methods perform in terms of IC-index complementing existing prediction performance estimators.
A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports
Jing Wang, Ph.D.1, Jeremy C. Weiss, M.D., Ph.D.1 1National Library of Medicine, Bethesda, Maryland, USA Abstract Timing of clinical events is central to characterization of patient trajectories, enabling analyses such as process tracing, forecasting, and causal reasoning. However, structured electronic health records capture few data elements critical to these tasks, while clinical reports lack temporal localization of events in structured form. We present a system that transforms case reports into textual time series--structured pairs of textual events and timestamps. We contrast manual and large language model (LLM) annotations (n=320 and n=390 respectively) of ten randomly-sampled PubMed open-access (PMOA) case reports (N=152,974) and assess inter-LLM agreement (n=3,103 N=93). We find that the LLM models have moderate event recall (O1-preview: 0.80) but high temporal concordance among identified events (O1-preview: 0.95). Introduction Clinical event timelines are key analytic devices in use in areas ranging from the visualization of patient trajectories in electronic health records to the development and update of clinical practice guidelines. While many automated documentation systems capture structured health information in relational databases with timestamping, textual data modalities often lack temporal granularity beyond creation and submission dates, despite being the main tool care providers use to document and communicate patient history and care planning.
OGBoost: A Python Package for Ordinal Gradient Boosting
Sharabiani, Mansour T. A., Bottle, Alex, Mahani, Alireza S.
This paper introduces OGBoost, a scikit-learn-compatible Python package for ordinal regression using gradient boosting. Ordinal variables (e.g., rating scales, quality assessments) lie between nominal and continuous data, necessitating specialized methods that reflect their inherent ordering. Built on a coordinate-descent approach for optimization and the latent-variable framework for ordinal regression, OGBoost performs joint optimization of a latent continuous regression function (functional gradient descent) and a threshold vector that converts the latent continuous value into discrete class probabilities (classical gradient descent). In addition to the stanadard methods for scikit-learn classifiers, the GradientBoostingOrdinal class implements a "decision_function" that returns the (scalar) value of the latent function for each observation, which can be used as a high-resolution alternative to class labels for comparing and ranking observations. The class has the option to use cross-validation for early stopping rather than a single holdout validation set, a more robust approach for small and/or imbalanced datasets. Furthermore, users can select base learners with different underlying algorithms and/or hyperparameters for use throughout the boosting iterations, resulting in a `heterogeneous' ensemble approach that can be used as a more efficient alternative to hyperparameter tuning (e.g. via grid search). We illustrate the capabilities of OGBoost through examples, using the wine quality dataset from the UCI respository. The package is available on PyPI and can be installed via "pip install ogboost".
ECTIL: Label-efficient Computational Tumour Infiltrating Lymphocyte (TIL) assessment in breast cancer: Multicentre validation in 2,340 patients with breast cancer
Schirris, Yoni, Voorthuis, Rosie, Opdam, Mark, Liefaard, Marte, Sonke, Gabe S, Dackus, Gwen, de Jong, Vincent, Wang, Yuwei, Van Rossum, Annelot, Steenbruggen, Tessa G, Steggink, Lars C, de Vries, Liesbeth G. E., van de Vijver, Marc, Salgado, Roberto, Gavves, Efstratios, van Diest, Paul J, Linn, Sabine C, Teuwen, Jonas, Menezes, Renee, Kok, Marleen, Horlings, Hugo
The level of tumour-infiltrating lymphocytes (TILs) is a prognostic factor for patients with (triple-negative) breast cancer (BC). Computational TIL assessment (CTA) has the potential to assist pathologists in this labour-intensive task, but current CTA models rely heavily on many detailed annotations. We propose and validate a fundamentally simpler deep learning based CTA that can be trained in only ten minutes on hundredfold fewer pathologist annotations. We collected whole slide images (WSIs) with TILs scores and clinical data of 2,340 patients with BC from six cohorts including three randomised clinical trials. Morphological features were extracted from whole slide images (WSIs) using a pathology foundation model. Our label-efficient Computational stromal TIL assessment model (ECTIL) directly regresses the TILs score from these features. ECTIL trained on only a few hundred samples (ECTIL-TCGA) showed concordance with the pathologist over five heterogeneous external cohorts (r=0.54-0.74, AUROC=0.80-0.94). Training on all slides of five cohorts (ECTIL-combined) improved results on a held-out test set (r=0.69, AUROC=0.85). Multivariable Cox regression analyses indicated that every 10% increase of ECTIL scores was associated with improved overall survival independent of clinicopathological variables (HR 0.86, p<0.01), similar to the pathologist score (HR 0.87, p<0.001). We demonstrate that ECTIL is highly concordant with an expert pathologist and obtains a similar hazard ratio. ECTIL has a fundamentally simpler design than existing methods and can be trained on orders of magnitude fewer annotations. Such a CTA may be used to pre-screen patients for, e.g., immunotherapy clinical trial inclusion, or as a tool to assist clinicians in the diagnostic work-up of patients with BC. Our model is available under an open source licence (https://github.com/nki-ai/ectil).