chemotherapy
UW-BioNLP at ChemoTimelines 2025: Thinking, Fine-Tuning, and Dictionary-Enhanced LLM Systems for Chemotherapy Timeline Extraction
Zhang, Tianmai M., Sun, Zhaoyi, Zeng, Sihang, Li, Chenxi, Abernethy, Neil F., Lam, Barbara D., Xia, Fei, Yetisgen, Meliha
The ChemoTimelines shared task benchmarks methods for constructing timelines of systemic anticancer treatment from electronic health records of cancer patients. This paper describes our methods, results, and findings for subtask 2 -- generating patient chemotherapy timelines from raw clinical notes. We evaluated strategies involving chain-of-thought thinking, supervised fine-tuning, direct preference optimization, and dictionary-based lookup to improve timeline extraction. All of our approaches followed a two-step workflow, wherein an LLM first extracted chemotherapy events from individual clinical notes, and then an algorithm normalized and aggregated events into patient-level timelines. Each specific method differed in how the associated LLM was utilized and trained. Multiple approaches yielded competitive performances on the test set leaderboard, with fine-tuned Qwen3-14B achieving the best official score of 0.678. Our results and analyses could provide useful insights for future attempts on this task as well as the design of similar tasks.
- North America > Mexico > Mexico City > Mexico City (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Dominican Republic (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Ovarian Cancer (0.46)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.46)
Utilizing the RAIN method and Graph SAGE Model to Identify Effective Drug Combinations for Gastric Neoplasm Treatment
Pirasteh, S. Z., Kiaei, Ali A., Bush, Mahnaz, Moghadam, Sabra, Aghaei, Raha, Sadeghigol, Behnaz
Background: Gastric neoplasm, primarily adenocarcinoma, is an aggressive cancer with high mortality, often diagnosed late, leading to complications like metastasis. Effective drug combinations are vital to address disease heterogeneity, enhance efficacy, reduce resistance, and improve patient outcomes. Methods: The RAIN method integrated Graph SAGE to propose drug combinations, using a graph model with p-value-weighted edges connecting drugs, genes, and proteins. NLP and systematic literature review (PubMed, Scopus, etc.) validated proposed drugs, followed by network meta-analysis to assess efficacy, implemented in Python. Results: Oxaliplatin, fluorouracil, and trastuzumab were identified as effective, supported by 61 studies. Fluorouracil alone had a p-value of 0.0229, improving to 0.0099 with trastuzumab, and 0.0069 for the triple combination, indicating superior efficacy. Conclusion: The RAIN method, combining AI and network meta-analysis, effectively identifies optimal drug combinations for gastric neoplasm, offering a promising strategy to enhance treatment outcomes and guide health policy.
- North America > United States (0.45)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
- (5 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Gastric Cancer (0.78)
- Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.67)
A Machine Learning Framework for Breast Cancer Treatment Classification Using a Novel Dataset
Hasan, Md Nahid, Murshed, Md Monzur, Hasan, Md Mahadi, Chowdhury, Faysal A.
Breast cancer (BC) remains a significant global health challenge, with personalized treatment selection complicated by the disease's molecular and clinical heterogeneity. BC treatment decisions rely on various patient-specific clinical factors, and machine learning (ML) offers a powerful approach to predicting treatment outcomes. This study utilizes The Cancer Genome Atlas (TCGA) breast cancer clinical dataset to develop ML models for predicting the likelihood of undergoing chemotherapy or hormonal therapy. The models are trained using five-fold cross-validation and evaluated through performance metrics, including accuracy, precision, recall, specificity, sensitivity, F1-score, and area under the receiver operating characteristic curve (AUROC). Model uncertainty is assessed using bootstrap techniques, while SHAP values enhance interpretability by identifying key predictors. Among the tested models, the Gradient Boosting Machine (GBM) achieves the highest stable performance (accuracy = 0.7718, AUROC = 0.8252), followed by Extreme Gradient Boosting (XGBoost) (accuracy = 0.7557, AUROC = 0.8044) and Adaptive Boosting (AdaBoost) (accuracy = 0.7552, AUROC = 0.8016). These findings underscore the potential of ML in supporting personalized breast cancer treatment decisions through data-driven insights.
- North America > United States > Texas (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
- North America > United States > Florida > Lee County > Fort Myers (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.80)
MedCite: Can Language Models Generate Verifiable Text for Medicine?
Wang, Xiao, Tan, Mengjue, Jin, Qiao, Xiong, Guangzhi, Hu, Yu, Zhang, Aidong, Lu, Zhiyong, Zhang, Minjia
Existing LLM-based medical question-answering systems lack citation generation and evaluation capabilities, raising concerns about their adoption in practice. In this work, we introduce \name, the first end-to-end framework that facilitates the design and evaluation of citation generation with LLMs for medical tasks. Meanwhile, we introduce a novel multi-pass retrieval-citation method that generates high-quality citations. Our evaluation highlights the challenges and opportunities of citation generation for medical tasks, while identifying important design choices that have a significant impact on the final citation quality. Our proposed method achieves superior citation precision and recall improvements compared to strong baseline methods, and we show that evaluation results correlate well with annotation results from professional experts.
- Europe > Austria > Vienna (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Overview (0.93)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (5 more...)
Developing hybrid mechanistic and data-driven personalized prediction models for platelet dynamics
Steinacker, Marie, Kheifetz, Yuri, Scholz, Markus
Hematotoxicity, drug-induced damage to the blood-forming system, is a frequent side effect of cytotoxic chemotherapy and poses a significant challenge in clinical practice due to its high inter-patient variability and limited predictability. Current mechanistic models often struggle to accurately forecast outcomes for patients with irregular or atypical trajectories. In this study, we develop and compare hybrid mechanistic and data-driven approaches for individualized time series modeling of platelet counts during chemotherapy. We consider hybrid models that combine mechanistic models with neural networks, known as universal differential equations. As a purely data-driven alternative, we utilize a nonlinear autoregressive exogenous model using gated recurrent units as the underlying architecture. These models are evaluated across a range of real patient scenarios, varying in data availability and sparsity, to assess predictive performance. Our findings demonstrate that data-driven methods, when provided with sufficient data, significantly improve prediction accuracy, particularly for high-risk patients with irregular platelet dynamics. This highlights the potential of data-driven approaches in enhancing clinical decision-making. In contrast, hybrid and mechanistic models are superior in scenarios with limited or sparse data. The proposed modeling and comparison framework is generalizable and could be extended to predict other treatment-related toxicities, offering broad applicability in personalized medicine.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
CAST: Time-Varying Treatment Effects with Application to Chemotherapy and Radiotherapy on Head and Neck Squamous Cell Carcinoma
Yang, Everest, Vasishtha, Ria, Dad, Luqman K., Kachnic, Lisa A., Hope, Andrew, Wang, Eric, Wu, Xiao, Yuan, Yading, Brenner, David J., Shuryak, Igor
Causal machine learning (CML) enables individualized estimation of treatment effects, offering critical advantages over traditional correlation-based methods. However, existing approaches for medical survival data with censoring such as causal survival forests estimate effects at fixed time points, limiting their ability to capture dynamic changes over time. We introduce Causal Analysis for Survival Trajectories (CAST), a novel framework that models treatment effects as continuous functions of time following treatment. By combining parametric and non-parametric methods, CAST overcomes the limitations of discrete time-point analysis to estimate continuous effect trajectories. Using the RADCURE dataset [1] of 2,651 patients with head and neck squamous cell carcinoma (HNSCC) as a clinically relevant example, CAST models how chemotherapy and radiotherapy effects evolve over time at the population and individual levels. By capturing the temporal dynamics of treatment response, CAST reveals how treatment effects rise, peak, and decline over the follow-up period, helping clinicians determine when and for whom treatment benefits are maximized. This framework advances the application of CML to personalized care in HNSCC and other life-threatening medical conditions. Source code/data available at: https://github.com/CAST-FW/HNSCC
- Europe > Switzerland > Basel-City > Basel (0.04)
- North America > United States > Illinois (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.93)
Automatic quantification of breast cancer biomarkers from multiple 18F-FDG PET image segmentation
Tareke, Tewele W., Payan, Neree, Cochet, Alexandre, Arnould, Laurent, Presles, Benoit, Vrigneaud, Jean-Marc, Meriaudeau, Fabrice, Lalande, Alain
Neoadjuvant chemotherapy (NAC) has become a standard clinical practice for tumor downsizing in breast cancer with 18F-FDG Positron Emission Tomography (PET). Our work aims to leverage PET imaging for the segmentation of breast lesions. The focus is on developing an automated system that accurately segments primary tumor regions and extracts key biomarkers from these areas to provide insights into the evolution of breast cancer following the first course of NAC. 243 baseline 18F-FDG PET scans (PET_Bl) and 180 follow-up 18F-FDG PET scans (PET_Fu) were acquired before and after the first course of NAC, respectively. Firstly, a deep learning-based breast tumor segmentation method was developed. The optimal baseline model (model trained on baseline exams) was fine-tuned on 15 follow-up exams and adapted using active learning to segment tumor areas in PET_Fu. The pipeline computes biomarkers such as maximum standardized uptake value (SUVmax), metabolic tumor volume (MTV), and total lesion glycolysis (TLG) to evaluate tumor evolution between PET_Fu and PET_Bl. Quality control measures were employed to exclude aberrant outliers. The nnUNet deep learning model outperformed in tumor segmentation on PET_Bl, achieved a Dice similarity coefficient (DSC) of 0.89 and a Hausdorff distance (HD) of 3.52 mm. After fine-tuning, the model demonstrated a DSC of 0.78 and a HD of 4.95 mm on PET_Fu exams. Biomarkers analysis revealed very strong correlations whatever the biomarker between manually segmented and automatically predicted regions. The significant average decrease of SUVmax, MTV and TLG were 5.22, 11.79 cm3 and 19.23 cm3, respectively. The presented approach demonstrates an automated system for breast tumor segmentation from 18F-FDG PET. Thanks to the extracted biomarkers, our method enables the automatic assessment of cancer progression.
- Europe > France (0.04)
- North America > United States (0.04)
- Europe > Sweden (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.95)
Exploring Large Language Models for Specialist-level Oncology Care
Palepu, Anil, Dhillon, Vikram, Niravath, Polly, Weng, Wei-Hung, Prasad, Preethi, Saab, Khaled, Tanno, Ryutaro, Cheng, Yong, Mai, Hanh, Burns, Ethan, Ajmal, Zainub, Kulkarni, Kavita, Mansfield, Philip, Webster, Dale, Barral, Joelle, Gottweis, Juraj, Schaekermann, Mike, Mahdavi, S. Sara, Natarajan, Vivek, Karthikesalingam, Alan, Tu, Tao
Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast oncology care without specific fine-tuning to this challenging domain. To perform this evaluation, we curated a set of 50 synthetic breast cancer vignettes representing a range of treatment-naive and treatment-refractory cases and mirroring the key information available to a multidisciplinary tumor board for decision-making (openly released with this work). We developed a detailed clinical rubric for evaluating management plans, including axes such as the quality of case summarization, safety of the proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery and hormonal therapy. To improve performance, we enhanced AMIE with the inference-time ability to perform web search retrieval to gather relevant and up-to-date clinical knowledge and refine its responses with a multi-stage self-critique pipeline. We compare response quality of AMIE with internal medicine trainees, oncology fellows, and general oncology attendings under both automated and specialist clinician evaluations. In our evaluations, AMIE outperformed trainees and fellows demonstrating the potential of the system in this challenging and important domain. We further demonstrate through qualitative examples, how systems such as AMIE might facilitate conversational interactions to assist clinicians in their decision making. However, AMIE's performance was overall inferior to attending oncologists suggesting that further research is needed prior to consideration of prospective uses.
- North America > United States > Wisconsin (0.04)
- North America > United States > Texas (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Multi-modal AI for comprehensive breast cancer prognostication
Witowski, Jan, Zeng, Ken, Cappadona, Joseph, Elayoubi, Jailan, Chiru, Elena Diana, Chan, Nancy, Kang, Young-Joon, Howard, Frederick, Ostrovnaya, Irina, Fernandez-Granda, Carlos, Schnabel, Freya, Ozerdem, Ugur, Liu, Kangning, Steinsnyder, Zoe, Thakore, Nitya, Sadic, Mohammad, Yeung, Frank, Liu, Elisa, Hill, Theodore, Swett, Benjamin, Rigau, Danielle, Clayburn, Andrew, Speirs, Valerie, Vetter, Marcus, Sojak, Lina, Soysal, Simone Muenst, Baumhoer, Daniel, Choucair, Khalil, Zong, Yu, Daoud, Lina, Saad, Anas, Abdulsattar, Waleed, Beydoun, Rafic, Pan, Jia-Wern, Makmur, Haslina, Teo, Soo-Hwang, Pak, Linda Ma, Angel, Victor, Zilenaite-Petrulaitiene, Dovile, Laurinavicius, Arvydas, Klar, Natalie, Piening, Brian D., Bifulco, Carlo, Jun, Sun-Young, Yi, Jae Pak, Lim, Su Hyun, Brufsky, Adam, Esteva, Francisco J., Pusztai, Lajos, LeCun, Yann, Geras, Krzysztof J.
Treatment selection in breast cancer is guided by molecular subtypes and clinical characteristics. Recurrence risk assessment plays a crucial role in personalizing treatment. Current methods, including genomic assays, have limited accuracy and clinical utility, leading to suboptimal decisions for many patients. We developed a test for breast cancer patient stratification based on digital pathology and clinical characteristics using novel AI methods. Specifically, we utilized a vision transformer-based pan-cancer foundation model trained with self-supervised learning to extract features from digitized H&E-stained slides. These features were integrated with clinical data to form a multi-modal AI test predicting cancer recurrence and death. The test was developed and evaluated using data from a total of 8,161 breast cancer patients across 15 cohorts originating from seven countries. Of these, 3,502 patients from five cohorts were used exclusively for evaluation, while the remaining patients were used for training. Our test accurately predicted our primary endpoint, disease-free interval, in the five external cohorts (C-index: 0.71 [0.68-0.75], HR: 3.63 [3.02-4.37, p<0.01]). In a direct comparison (N=858), the AI test was more accurate than Oncotype DX, the standard-of-care 21-gene assay, with a C-index of 0.67 [0.61-0.74] versus 0.61 [0.49-0.73], respectively. Additionally, the AI test added independent information to Oncotype DX in a multivariate analysis (HR: 3.11 [1.91-5.09, p<0.01)]). The test demonstrated robust accuracy across all major breast cancer subtypes, including TNBC (C-index: 0.71 [0.62-0.81], HR: 3.81 [2.35-6.17, p=0.02]), where no diagnostic tools are currently recommended by clinical guidelines. These results suggest that our AI test can improve accuracy, extend applicability to a wider range of patients, and enhance access to treatment selection tools.
- Europe > Switzerland > Basel-City > Basel (0.06)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > United Kingdom > Wales (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Computational Pathology for Accurate Prediction of Breast Cancer Recurrence: Development and Validation of a Deep Learning-based Tool
Su, Ziyu, Guo, Yongxin, Wesolowski, Robert, Tozbikian, Gary, O'Connell, Nathaniel S., Niazi, M. Khalid Khan, Gurcan, Metin N.
Accurate recurrence risk stratification is crucial for optimizing treatment plans for breast cancer patients. Current prognostic tools like Oncotype DX (ODX) offer valuable genomic insights for HR+/HER2-patients but are limited by cost and accessibility, particularly in underserved populations. In this study, we present Deep-BCR-Auto, a deep learning-based computational pathology approach that predicts breast cancer recurrence risk from routine H&E-stained whole slide images (WSIs). Our methodology was validated on two independent cohorts: the TCGA-BRCA dataset and an in-house dataset from The Ohio State University (OSU). Deep-BCR-Auto demonstrated robust performance in stratifying patients into low-and high-recurrence risk categories. On the TCGA-BRCA dataset, the model achieved an area under the receiver operating characteristic curve (AUROC) of 0.827, significantly outperforming existing weakly supervised models (p=0.041). In the independent OSU dataset, Deep-BCR-Auto maintained strong generalizability, achieving an AUROC of 0.832, along with 82.0% accuracy, 85.0% specificity, and 67.7% sensitivity. These findings highlight the potential of computational pathology as a cost-effective alternative for recurrence risk assessment, broadening access to personalized treatment strategies. This study underscores the clinical utility of integrating deep learning-based computational pathology into routine pathological assessment for breast cancer prognosis across diverse clinical settings. Keywords: Computational pathology, Breast cancer, Deep learning, Oncotype-DX, Image analysis 1. Introduction Breast cancer is the most prevalent cancer and the second biggest reason for cancer-related death in women in the United States [1]. The effective treatment options and prognosis for breast cancer patients are highly dependent on the patient's molecular subtype of breast cancer as determined by estrogen, progesterone, and human epidermal growth factor 2 (HER2) receptor expression. Among all different subtypes, hormone receptorpositive (HR+) and epidermal growth factor receptor-negative (HER2-) breast cancer represents the most common entity, accounting for approximately 65% of all cases [2, 3].
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
- North America > United States > Maryland > Montgomery County > Bethesda (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)