Nourzadeh, Hamidreza
OpenKBP-Opt: An international and reproducible evaluation of 76 knowledge-based planning pipelines
Babier, Aaron, Mahmood, Rafid, Zhang, Binghao, Alves, Victor G. L., Barragán-Montero, Ana Maria, Beaudry, Joel, Cardenas, Carlos E., Chang, Yankui, Chen, Zijie, Chun, Jaehee, Diaz, Kelly, Eraso, Harold David, Faustmann, Erik, Gaj, Sibaji, Gay, Skylar, Gronberg, Mary, Guo, Bingqi, He, Junjun, Heilemann, Gerd, Hira, Sanchit, Huang, Yuliang, Ji, Fuxin, Jiang, Dashan, Giraldo, Jean Carlo Jimenez, Lee, Hoyeon, Lian, Jun, Liu, Shuolin, Liu, Keng-Chi, Marrugo, José, Miki, Kentaro, Nakamura, Kunio, Netherton, Tucker, Nguyen, Dan, Nourzadeh, Hamidreza, Osman, Alexander F. I., Peng, Zhao, Muñoz, José Darío Quinto, Ramsl, Christian, Rhee, Dong Joo, Rodriguez, Juan David, Shan, Hongming, Siebers, Jeffrey V., Soomro, Mumtaz H., Sun, Kay, Hoyos, Andrés Usuga, Valderrama, Carlos, Verbeek, Rob, Wang, Enpei, Willems, Siri, Wu, Qi, Xu, Xuanang, Yang, Sen, Yuan, Lulin, Zhu, Simeng, Zimmermann, Lukas, Moore, Kevin L., Purdie, Thomas G., McNiven, Andrea L., Chan, Timothy C. Y.
We establish an open framework for developing plan optimization models for knowledge-based planning (KBP) in radiotherapy. Our framework includes reference plans for 100 patients with head-and-neck cancer and high-quality dose predictions from 19 KBP models that were developed by different research groups during the OpenKBP Grand Challenge. The dose predictions were input to four optimization models to form 76 unique KBP pipelines that generated 7600 plans. The predictions and plans were compared to the reference plans via: dose score, which is the average mean absolute voxel-by-voxel difference in dose a model achieved; the deviation in dose-volume histogram (DVH) criterion; and the frequency of clinical planning criteria satisfaction. We also performed a theoretical investigation to justify our dose mimicking models. The range in rank order correlation of the dose score between predictions and their KBP pipelines was 0.50 to 0.62, which indicates that the quality of the predictions is generally positively correlated with the quality of the plans. Additionally, compared to the input predictions, the KBP-generated plans performed significantly better (P<0.05; one-sided Wilcoxon test) on 18 of 23 DVH criteria. Similarly, each optimization model generated plans that satisfied a higher percentage of criteria than the reference plans. Lastly, our theoretical investigation demonstrated that the dose mimicking models generated plans that are also optimal for a conventional planning model. This was the largest international effort to date for evaluating the combination of KBP prediction and optimization models. In the interest of reproducibility, our data and code is freely available at https://github.com/ababier/open-kbp-opt.
DeepDoseNet: A Deep Learning model for 3D Dose Prediction in Radiation Therapy
Soomro, Mumtaz Hussain, Alves, Victor Gabriel Leandro, Nourzadeh, Hamidreza, Siebers, Jeffrey V.
The DeepDoseNet 3D dose prediction model based on ResNet and Dilated DenseNet is proposed. The 340 head-and-neck datasets from the 2020 AAPM OpenKBP challenge were utilized, with 200 for training, 40 for validation, and 100 for testing. Structures include 56Gy, 63Gy, 70Gy PTVs, and brainstem, spinal cord, right parotid, left parotid, larynx, esophagus, and mandible OARs. Mean squared error (MSE) loss, mean absolute error (MAE) loss, and MAE plus dose-volume histogram (DVH) based loss functions were investigated. Each model's performance was compared using a 3D dose score, $\bar{S_{D}}$, (mean absolute difference between ground truth and predicted 3D dose distributions) and a DVH score, $\bar{S_{DVH}}$ (mean absolute difference between ground truth and predicted dose-volume metrics).Furthermore, DVH metrics Mean[Gy] and D0.1cc [Gy] for OARs and D99%, D95%, D1% for PTVs were computed. DeepDoseNet with the MAE plus DVH-based loss function had the best dose score performance of the OpenKBP entries. MAE+DVH model had the lowest prediction error (P<0.0001, Wilcoxon test) on validation and test datasets (validation: $\bar{S_{D}}$=2.3Gy, $\bar{S_{DVH}}$=1.9Gy; test: $\bar{S_{D}}$=2.0Gy, $\bar{S_{DVH}}$=1.6Gy) followed by the MAE model (validation: $\bar{S_{D}}$=3.6Gy, $\bar{S_{DVH}}$=2.4Gy; test: $\bar{S_{D}}$=3.5Gy, $\bar{S_{DVH}}$=2.3Gy). The MSE model had the highest prediction error (validation: $\bar{S_{D}}$=3.7Gy, $\bar{S_{DVH}}$=3.2Gy; test: $\bar{S_{D}}$=3.6Gy, $\bar{S_{DVH}}$=3.0Gy). No significant difference was found among models in terms of Mean [Gy], but the MAE+DVH model significantly outperformed the MAE and MSE models in terms of D0.1cc[Gy], particularly for mandible and parotids on both validation (P<0.01) and test (P<0.0001) datasets. MAE+DVH outperformed (P<0.0001) in terms of D99%, D95%, D1% for targets. MAE+DVH reduced $\bar{S_{D}}$ by ~60% and $\bar{S_{DVH}}$ by ~70%.