Lee, Joon
Unsupervised Feature Selection to Identify Important ICD-10 Codes for Machine Learning: A Case Study on a Coronary Artery Disease Patient Cohort
Ghasemi, Peyman, Lee, Joon
The use of International Classification of Diseases (ICD) codes in healthcare presents a challenge in selecting relevant codes as features for machine learning models due to this system's large number of codes. In this study, we compared several unsupervised feature selection methods for an ICD code database of 49,075 coronary artery disease patients in Alberta, Canada. Specifically, we employed Laplacian Score, Unsupervised Feature Selection for Multi-Cluster Data, Autoencoder Inspired Unsupervised Feature Selection, Principal Feature Analysis, and Concrete Autoencoders with and without ICD tree weight adjustment to select the 100 best features from over 9,000 codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. Our findings revealed that the Concrete Autoencoder methods outperformed all other methods in both tasks. Furthermore, the weight adjustment in the Concrete Autoencoder method decreased the complexity of features.
Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning
Kline, Adrienne, Lee, Joon
Model evaluation is a critical component in supervised machine learning classification analyses. Traditional metrics do not currently incorporate case difficulty. This renders the classification results unbenchmarked for generalization. Item Response Theory (IRT) and Computer Adaptive Testing (CAT) with machine learning can benchmark datasets independent of the end-classification results. This provides high levels of case-level information regarding evaluation utility. To showcase, two datasets were used: 1) health-related and 2) physical science. For the health dataset a two-parameter IRT model, and for the physical science dataset a polytonomous IRT model, was used to analyze predictive features and place each case on a difficulty continuum. A CAT approach was used to ascertain the algorithms' performance and applicability to new data. This method provides an efficient way to benchmark data, using only a fraction of the dataset (less than 1%) and 22-60x more computationally efficient than traditional metrics. This novel metric, termed Machine Learning Capability (MLC) has additional benefits as it is unbiased to outcome classification and a standardized way to make model comparisons within and across datasets. MLC provides a metric on the limitation of supervised machine learning algorithms. In situations where the algorithm falls short, other input(s) are required for decision-making.
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Bakas, Spyridon, Reyes, Mauricio, Jakab, Andras, Bauer, Stefan, Rempfler, Markus, Crimi, Alessandro, Shinohara, Russell Takeshi, Berger, Christoph, Ha, Sung Min, Rozycki, Martin, Prastawa, Marcel, Alberts, Esther, Lipkova, Jana, Freymann, John, Kirby, Justin, Bilello, Michel, Fathallah-Shaykh, Hassan, Wiest, Roland, Kirschke, Jan, Wiestler, Benedikt, Colen, Rivka, Kotrotsou, Aikaterini, Lamontagne, Pamela, Marcus, Daniel, Milchenko, Mikhail, Nazeri, Arash, Weber, Marc-Andre, Mahajan, Abhishek, Baid, Ujjwal, Kwon, Dongjin, Agarwal, Manu, Alam, Mahbubul, Albiol, Alberto, Albiol, Antonio, Alex, Varghese, Tran, Tuan Anh, Arbel, Tal, Avery, Aaron, B., Pranjal, Banerjee, Subhashis, Batchelder, Thomas, Batmanghelich, Kayhan, Battistella, Enzo, Bendszus, Martin, Benson, Eze, Bernal, Jose, Biros, George, Cabezas, Mariano, Chandra, Siddhartha, Chang, Yi-Ju, Chazalon, Joseph, Chen, Shengcong, Chen, Wei, Chen, Jefferson, Cheng, Kun, Christoph, Meinel, Chylla, Roger, Clérigues, Albert, Costa, Anthony, Cui, Xiaomeng, Dai, Zhenzhen, Dai, Lutao, Deutsch, Eric, Ding, Changxing, Dong, Chao, Dudzik, Wojciech, Estienne, Théo, Shin, Hyung Eun, Everson, Richard, Fabrizio, Jonathan, Fang, Longwei, Feng, Xue, Fidon, Lucas, Fridman, Naomi, Fu, Huan, Fuentes, David, Gering, David G, Gao, Yaozong, Gates, Evan, Gholami, Amir, Gong, Mingming, González-Villá, Sandra, Pauloski, J. Gregory, Guan, Yuanfang, Guo, Sheng, Gupta, Sudeep, Thakur, Meenakshi H, Maier-Hein, Klaus H., Han, Woo-Sup, He, Huiguang, Hernández-Sabaté, Aura, Herrmann, Evelyn, Himthani, Naveen, Hsu, Winston, Hsu, Cheyu, Hu, Xiaojun, Hu, Xiaobin, Hu, Yan, Hu, Yifan, Hua, Rui, Huang, Teng-Yi, Huang, Weilin, Huo, Quan, HV, Vivek, Isensee, Fabian, Islam, Mobarakol, Albiol, Francisco J., Wang, Chiatse J., Jambawalikar, Sachin, Jose, V Jeya Maria, Jian, Weijian, Jin, Peter, Jungo, Alain, Nuechterlein, Nicholas K, Kao, Po-Yu, Kermi, Adel, Keutzer, Kurt, Khened, Mahendra, Kickingereder, Philipp, King, Nik, Knapp, Haley, Knecht, Urspeter, Kohli, Lisa, Kong, Deren, Kong, Xiangmao, Koppers, Simon, Kori, Avinash, Krishnamurthi, Ganapathy, Kumar, Piyush, Kushibar, Kaisar, Lachinov, Dmitrii, Lee, Joon, Lee, Chengen, Lee, Yuehchou, Lefkovits, Szidonia, Lefkovits, Laszlo, Li, Tengfei, Li, Hongwei, Li, Wenqi, Li, Hongyang, Li, Xiaochuan, Lin, Zheng-Shen, Lin, Fengming, Liu, Chang, Liu, Boqiang, Liu, Xiang, Liu, Mingyuan, Liu, Ju, Lladó, Xavier, Luo, Lin, Iftekharuddin, Khan M., Tsai, Yuhsiang M., Ma, Jun, Ma, Kai, Mackie, Thomas, Mahmoudi, Issam, Marcinkiewicz, Michal, McKinley, Richard, Mehta, Sachin, Mehta, Raghav, Meier, Raphael, Merhof, Dorit, Meyer, Craig, Mitra, Sushmita, Moiyadi, Aliasgar, Mrukwa, Grzegorz, Monteiro, Miguel A. B., Myronenko, Andriy, Carver, Eric N, Nalepa, Jakub, Ngo, Thuyen, Niu, Chen, Oermann, Eric, Oliveira, Arlindo, Oliver, Arnau, Ourselin, Sebastien, French, Andrew P., Pound, Michael P., Pridmore, Tony P., Serrano-Rubio, Juan Pablo, Paragios, Nikos, Paschke, Brad, Pei, Linmim, Peng, Suting, Pham, Bao, Piella, Gemma, Pillai, G. N., Piraud, Marie, Popli, Anmol, Prčkovska, Vesna, Puch, Santi, Puybareau, Élodie, Qiao, Xu, Suter, Yannick R, Scott, Matthew R., Rane, Swapnil, Rebsamen, Michael, Ren, Hongliang, Ren, Xuhua, Rezaei, Mina, Lorenzo, Pablo Ribalta, Rippel, Oliver, Robert, Charlotte, Choudhury, Ahana Roy, Jackson, Aaron S., Manjunath, B. S., Salem, Mostafa, Salvi, Joaquim, Sánchez, Irina, Schellingerhout, Dawid, Shboul, Zeina, Shen, Haipeng, Shen, Dinggang, Shenoy, Varun, Shi, Feng, Shu, Hai, Snyder, James, Han, Il Song, Soni, Mehul, Stawiaski, Jean, Subramanian, Shashank, Sun, Li, Sun, Roger, Sun, Jiawei, Sun, Kay, Sun, Yu, Sun, Guoxia, Sun, Shuang, Park, Moo Sung, Szilagyi, Laszlo, Talbar, Sanjay, Tao, Dacheng, Tao, Dacheng, Khadir, Mohamed Tarek, Thakur, Siddhesh, Tochon, Guillaume, Tran, Tuan, Tseng, Kuan-Lun, Turlapov, Vadim, Tustison, Nicholas, Shankar, B. Uma, Vakalopoulou, Maria, Valverde, Sergi, Vanguri, Rami, Vasiliev, Evgeny, Vercauteren, Tom, Vidyaratne, Lasitha, Vivekanandan, Ajeet, Wang, Guotai, Wang, Qian, Wang, Weichung, Wen, Ning, Wen, Xin, Weninger, Leon, Wick, Wolfgang, Wu, Shaocheng, Wu, Qiang, Xia, Yong, Xu, Yanwu, Xu, Xiaowen, Xu, Peiyuan, Yang, Tsai-Ling, Yang, Xiaoping, Yang, Hao-Yu, Yang, Junlin, Yang, Haojin, Yao, Hongdou, Young-Moxon, Brett, Yue, Xiangyu, Zhang, Songtao, Zhang, Angela, Zhang, Kun, Zhang, Xuejie, Zhang, Lichi, Zhang, Xiaoyue, Zhao, Sicheng, Zhao, Yu, Zheng, Yefeng, Zhong, Liming, Zhou, Chenhong, Zhou, Xiaobing, Zhu, Hongtu, Zong, Weiwei, Kalpathy-Cramer, Jayashree, Farahani, Keyvan, Davatzikos, Christos, van Leemput, Koen, Menze, Bjoern
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e. 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that undergone gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.