Chen, Shaohan
Cardiovascular Disease Detection By Leveraging Semi-Supervised Learning
Chen, Shaohan, Liu, Zheyan, Zheng, Huili, Zhang, Qimin, Gong, Yiru
Cardiovascular disease (CVD) persists as a primary cause of death on a global scale, which requires more effective and timely detection methods. Traditional supervised learning approaches for CVD detection rely heavily on large-labeled datasets, which are often difficult to obtain. This paper employs semi-supervised learning models to boost efficiency and accuracy of CVD detection when there are few labeled samples. By leveraging both labeled and vast amounts of unlabeled data, our approach demonstrates improvements in prediction performance, while reducing the dependency on labeled data. Experimental results in a publicly available dataset show that semi-supervised models outperform traditional supervised learning techniques, providing an intriguing approach for the initial identification of cardiovascular disease within clinical environments.
Domain Knowledge integrated for Blast Furnace Classifier Design
Chen, Shaohan, Fan, Di, Gao, Chuanhou
Blast furnace modeling and control is one of the important problems in the industrial field, and the black-box model is an effective mean to describe the complex blast furnace system. In practice, there are often different learning targets, such as safety and energy saving in industrial applications, depending on the application. For this reason, this paper proposes a framework to design a domain knowledge integrated classification model that yields a classifier for industrial application. Our knowledge incorporated learning scheme allows the users to create a classifier that identifies "important samples" (whose misclassifications can lead to severe consequences) more correctly, while keeping the proper precision of classifying the remaining samples. The effectiveness of the proposed method has been verified by two real blast furnace datasets, which guides the operators to utilize their prior experience for controlling the blast furnace systems better.
Transfer Learning in Information Criteria-based Feature Selection
Chen, Shaohan, Sahinidis, Nikolaos V., Gao, Chuanhou
This paper investigates the effectiveness of transfer learning based on Mallows' Cp. We propose a procedure that combines transfer learning with Mallows' Cp (TLCp) and prove that it outperforms the conventional Mallows' Cp criterion in terms of accuracy and stability. Our theoretical results indicate that, for any sample size in the target domain, the proposed TLCp estimator performs better than the Cp estimator by the mean squared error (MSE) metric in the case of orthogonal predictors, provided that i) the dissimilarity between the tasks from source domain and target domain is small, and ii) the procedure parameters (complexity penalties) are tuned according to certain explicit rules. Moreover, we show that our transfer learning framework can be extended to other feature selection criteria, such as the Bayesian information criterion. By analyzing the solution of the orthogonalized Cp, we identify an estimator that asymptotically approximates the solution of the Cp criterion in the case of non-orthogonal predictors. Similar results are obtained for the non-orthogonal TLCp. Finally, simulation studies and applications with real data demonstrate the usefulness of the TLCp scheme.
Knowledge Integrated Classifier Design Based on Utility Optimization
Chen, Shaohan, Gao, Chuanhou
This paper proposes a systematic framework to design a classification model that yields a classifier which optimizes a utility function based on prior knowledge. Specifically, as the data size grows, we prove that the produced classifier asymptotically converges to the optimal classifier, an extended version of the Bayes rule, which maximizes the utility function. Therefore, we provide a meaningful theoretical interpretation for modeling with the knowledge incorporated. Our knowledge incorporation method allows domain experts to guide the classifier towards correctly classifying data that they think to be more significant.
Asymptotic performance of regularized multi-task learning
Chen, Shaohan, Gao, Chuanhou
This paper analyzes asymptotic performance of a regularized multi-task learning model where task parameters are optimized jointly. If tasks are closely related, empirical work suggests multi-task learning models to outperform single-task ones in finite sample cases. As data size grows indefinitely, we show the learned multi-classifier to optimize an average misclassification error function which depicts the risk of applying multi-task learning algorithm to making decisions. This technique conclusion demonstrates the regularized multi-task learning model to be able to produce reliable decision rule for each task in the sense that it will asymptotically converge to the corresponding Bayes rule. Also, we find the interaction effect between tasks vanishes as data size growing indefinitely, which is quite different from the behavior in finite sample cases.
Enhancing Transparency of Black-box Soft-margin SVM by Integrating Data-based Prior Information
Chen, Shaohan, Gao, Chuanhou, Zhang, Ping
The lack of transparency often makes the black-box models difficult to be applied to many practical domains. For this reason, the current work, from the black-box model input port, proposes to incorporate data-based prior information into the black-box soft-margin SVM model to enhance its transparency. The concept and incorporation mechanism of data-based prior information are successively developed, based on which the transparent or partly transparent SVM optimization model is designed and then solved through handily rewriting the optimization problem as a nonlinear quadratic programming problem. An algorithm for mining data-based linear prior information from data set is also proposed, which generates a linear expression with respect to two appropriate inputs identified from all inputs of system. At last, the proposed transparency strategy is applied to eight benchmark examples and two real blast furnace examples for effectiveness exhibition.