Goto

Collaborating Authors

 test 0


Performance of weakly-supervised electronic health record-based phenotyping methods in rare-outcome settings

Hong, Yunjing, Nelson, Jennifer C., Williamson, Brian D.

arXiv.org Machine Learning

Accurately identifying patients with specific medical conditions is a key challenge when using clinical data from electronic health records. Our objective was to comprehensively assess when weakly-supervised prediction methods, which use silver-standard labels (proxy measures of the true outcome) rather than gold-standard true labels, perform well in rare-outcome settings like vaccine safety studies. We compared three methods (PheNorm, MAP, and sureLDA) that combine structured features and features derived from clinical text using natural language processing, through an extensive simulation study with data-generating mechanisms ranging from simple to complex, varying outcome rates, and varying degrees of informative silver labels. We also considered using predicted probabilities to design a chart review validation study. No single method dominated the other across all prediction performance metrics. Probability-guided sampling selected a cohort enriched for patients with more mentions of important concepts in chart notes. SureLDA, the most complex of the three algorithms we considered, often performed well in simulations. Performance depended greatly on selected tuning parameters. Care should be taken when using weakly-supervised prediction methods in rare-outcome settings, particularly if the probabilities will be used in downstream analysis, but these methods can work well when silver labels are strong predictors of true outcomes.


ev 1+ev +|S|wq ev 1+ev =0. Solvingtheequation,wehave

Neural Information Processing Systems

Note that computing bR value can be done in constant time ifWp and Wn values are given. We stress that this result holds for any loss functionℓ satisfying ℓ(v,y) > ℓ(y,y) 0, with v =y. We performed additional experiments to empirically investigate the difference between uPU and nnPU risk estimators in regards to overfitting. In Table 11 we report the training risks (measured 19 asPUriskasdataisPU)andtesting risks(measured asPNriskasdataisPN)using zero-one loss ℓ0/1(v,y)=(1 sign(vy))/2onanumberofdatasets. From the results we can see that the training risk issignificantly smaller than the test risk in the uPU setting as compared to the nnPU setting, confirming that uPU suffers more from overfittingthannnPU. Table11: TrainingandtestingriskofPUET. Figure 4shows that the normalized risk reduction importance makes manymore pixels more important.








AIRwaves at CheckThat! 2025: Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking

Ashbaugh, Cem, Baumgärtner, Leon, Gress, Tim, Sidorov, Nikita, Werner, Daniel

arXiv.org Artificial Intelligence

Linking implicit scientific claims made on social media to their original publications is crucial for evidence-based fact-checking and scholarly discourse, yet it is hindered by lexical sparsity, very short queries, and domain-specific language. Team AIRwaves ranked second in Subtask 4b of the CLEF-2025 CheckThat! Lab with an evidence-retrieval approach that markedly outperforms the competition baseline. The optimized sparse-retrieval baseline(BM25) achieves MRR@5 = 0.5025 on the gold label blind test set. To surpass this baseline, a two-stage retrieval pipeline is introduced: (i) a first stage that uses a dual encoder based on E5-large, fine-tuned using in-batch and mined hard negatives and enhanced through chunked tokenization and rich document metadata; and (ii) a neural re-ranking stage using a SciBERT cross-encoder. Replacing purely lexical matching with neural representations lifts performance to MRR@5 = 0.6174, and the complete pipeline further improves to MRR@5 = 0.6828. The findings demonstrate that coupling dense retrieval with neural re-rankers delivers a powerful and efficient solution for tweet-to-study matching and provides a practical blueprint for future evidence-retrieval pipelines.