tailor-diag
Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses
Yu, Haojun, Li, Youcheng, Zhang, Nan, Niu, Zihan, Gong, Xuantong, Luo, Yanwen, Wu, Quanlin, Qin, Wangyan, Zhou, Mengyuan, Han, Jie, Tao, Jia, Zhao, Ziwei, Dai, Di, He, Di, Wang, Dong, Tang, Binghui, Huo, Ling, Zhu, Qingli, Wang, Yong, Wang, Liwei
Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifically, we introduce a pipeline, TAILOR, that builds a knowledge-driven generative model to produce tailored synthetic data. The generative model, using 3,749 lesions as source data, can generate millions of breast-US images, especially for error-prone rare cases. The generated data can be further used to build a diagnostic model for accurate and interpretable diagnoses. In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33.5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process. Moreover, on ductal carcinoma in situ (DCIS), our diagnostic model outperforms all radiologists by a large margin, with only 34 DCIS lesions in the source data. We believe that TAILOR can potentially be extended to various diseases and imaging modalities. 1 Main Breast cancer has become the most common cancer among women globally [1-3], and early detection These authors carried out this work as interns at Yizhun Medical AI Co., Ltd. The distribution of pathological subtypes is long-tailed in our training set which has 1,387 biopsy-confirmed lesions. In benign lesions, the two most frequent subtypes together account for 49.7% of the lesions, with the remaining 13 subtypes comprising 50.3%. In malignant lesions, the most frequent subtype accounts for 81.8% of the lesions, while the remaining 15 subtypes comprise only 18.2%. In breast cancer detection, ultrasound (US) is an essential imaging method widely adopted worldwide for its safety and low cost [5-7].