supervised learning model
Fair Sequential Selection Using Supervised Learning Models
We consider a selection problem where sequentially arrived applicants apply for a limited number of positions/jobs. At each time step, a decision maker accepts or rejects the given applicant using a pre-trained supervised learning model until all the vacant positions are filled. In this paper, we discuss whether the fairness notions (e.g., equal opportunity, statistical parity, etc.) that are commonly used in classification problems are suitable for the sequential selection problems. In particular, we show that even with a pre-trained model that satisfies the common fairness notions, the selection outcomes may still be biased against certain demographic groups. This observation implies that the fairness notions used in classification problems are not suitable for a selection problem where the applicants compete for a limited number of positions. We introduce a new fairness notion, ``Equal Selection (ES),'' suitable for sequential selection problems and propose a post-processing approach to satisfy the ES fairness notion. We also consider a setting where the applicants have privacy concerns, and the decision maker only has access to the noisy version of sensitive attributes. In this setting, we can show that the \textit{perfect} ES fairness can still be attained under certain conditions.
Bootstrapped Control Limits for Score-Based Concept Drift Control Charts
Wu, Jiezhong, Apley, Daniel W.
Monitoring for changes in a predictive relationship represented by a fitted supervised learning model (aka concept drift detection) is a widespread problem, e.g., for retrospective analysis to determine whether the predictive relationship was stable over the training data, for prospective analysis to determine when it is time to update the predictive model, for quality control of processes whose behavior can be characterized by a predictive relationship, etc. A general and powerful Fisher score-based concept drift approach has recently been proposed, in which concept drift detection reduces to detecting changes in the mean of the model's score vector using a multivariate exponentially weighted moving average (MEWMA). To implement the approach, the initial data must be split into two subsets. The first subset serves as the training sample to which the model is fit, and the second subset serves as an out-of-sample test set from which the MEWMA control limit (CL) is determined. In this paper, we develop a novel bootstrap procedure for computing the CL. Our bootstrap CL provides much more accurate control of false-alarm rate, especially when the sample size and/or false-alarm rate is small. It also allows the entire initial sample to be used for training, resulting in a more accurate fitted supervised learning model. We show that a standard nested bootstrap (inner loop accounting for future data variability and outer loop accounting for training sample variability) substantially underestimates variability and develop a 632-like correction that appropriately accounts for this. We demonstrate the advantages with numerical examples.
A Framework for Supervised and Unsupervised Segmentation and Classification of Materials Microstructure Images
Zhang, Kungang, Apley, Daniel W., Chen, Wei, Liu, Wing K., Brinson, L. Catherine
Microstructure of materials is often characterized through image analysis to understand processing-structure-properties linkages. We propose a largely automated framework that integrates unsupervised and supervised learning methods to classify micrographs according to microstructure phase/class and, for multiphase microstructures, segments them into different homogeneous regions. With the advance of manufacturing and imaging techniques, the ultra-high resolution of imaging that reveals the complexity of microstructures and the rapidly increasing quantity of images (i.e., micrographs) enables and necessitates a more powerful and automated framework to extract materials characteristics and knowledge. The framework we propose can be used to gradually build a database of microstructure classes relevant to a particular process or group of materials, which can help in analyzing and discovering/identifying new materials. The framework has three steps: (1) segmentation of multiphase micrographs through a recently developed score-based method so that different microstructure homogeneous regions can be identified in an unsupervised manner; (2) {identification and classification of} homogeneous regions of micrographs through an uncertainty-aware supervised classification network trained using the segmented micrographs from Step $1$ with their identified labels verified via the built-in uncertainty quantification and minimal human inspection; (3) supervised segmentation (more powerful than the segmentation in Step $1$) of multiphase microstructures through a segmentation network trained with micrographs and the results from Steps $1$-$2$ using a form of data augmentation. This framework can iteratively characterize/segment new homogeneous or multiphase materials while expanding the database to enhance performance. The framework is demonstrated on various sets of materials and texture images.
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report (0.82)
- Workflow (0.52)
- Materials (0.88)
- Health & Medicine (0.67)
Fair Sequential Selection Using Supervised Learning Models
We consider a selection problem where sequentially arrived applicants apply for a limited number of positions/jobs. At each time step, a decision maker accepts or rejects the given applicant using a pre-trained supervised learning model until all the vacant positions are filled. In this paper, we discuss whether the fairness notions (e.g., equal opportunity, statistical parity, etc.) that are commonly used in classification problems are suitable for the sequential selection problems. In particular, we show that even with a pre-trained model that satisfies the common fairness notions, the selection outcomes may still be biased against certain demographic groups. This observation implies that the fairness notions used in classification problems are not suitable for a selection problem where the applicants compete for a limited number of positions.
Supervised Learning Models for Early Detection of Albuminuria Risk in Type-2 Diabetes Mellitus Patients
Muharram, Arief Purnama, Tahapary, Dicky Levenus, Lestari, Yeni Dwi, Sarayar, Randy, Dirjayanto, Valerie Josephine
Diabetes, especially T2DM, continues to be a significant health problem. One of the major concerns associated with diabetes is the development of its complications. Diabetic nephropathy, one of the chronic complication of diabetes, adversely affects the kidneys, leading to kidney damage. Diagnosing diabetic nephropathy involves considering various criteria, one of which is the presence of a pathologically significant quantity of albumin in urine, known as albuminuria. Thus, early prediction of albuminuria in diabetic patients holds the potential for timely preventive measures. This study aimed to develop a supervised learning model to predict the risk of developing albuminuria in T2DM patients. The selected supervised learning algorithms included Na\"ive Bayes, Support Vector Machine (SVM), decision tree, random forest, AdaBoost, XGBoost, and Multi-Layer Perceptron (MLP). Our private dataset, comprising 184 entries of diabetes complications risk factors, was used to train the algorithms. It consisted of 10 attributes as features and 1 attribute as the target (albuminuria). Upon conducting the experiments, the MLP demonstrated superior performance compared to the other algorithms. It achieved accuracy and f1-score values as high as 0.74 and 0.75, respectively, making it suitable for screening purposes in predicting albuminuria in T2DM. Nonetheless, further studies are warranted to enhance the model's performance.
- Asia > Indonesia > Java > Jakarta > Jakarta (0.06)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > Indonesia > Java > West Java > Bandung (0.04)
Physics-informed PointNet: On how many irregular geometries can it solve an inverse problem simultaneously? Application to linear elasticity
Kashefi, Ali, Guibas, Leonidas J., Mukerji, Tapan
Regular physics-informed neural networks (PINNs) predict the solution of partial differential equations using sparse labeled data but only over a single domain. On the other hand, fully supervised learning models are first trained usually over a few thousand domains with known solutions (i.e., labeled data) and then predict the solution over a few hundred unseen domains. Physics-informed PointNet (PIPN) is primarily designed to fill this gap between PINNs (as weakly supervised learning models) and fully supervised learning models. In this article, we demonstrate that PIPN predicts the solution of desired partial differential equations over a few hundred domains simultaneously, while it only uses sparse labeled data. This framework benefits fast geometric designs in the industry when only sparse labeled data are available. Particularly, we show that PIPN predicts the solution of a plane stress problem over more than 500 domains with different geometries, simultaneously. Moreover, we pioneer implementing the concept of remarkable batch size (i.e., the number of geometries fed into PIPN at each sub-epoch) into PIPN. Specifically, we try batch sizes of 7, 14, 19, 38, 76, and 133. Additionally, the effect of the PIPN size, symmetric function in the PIPN architecture, and static and dynamic weights for the component of the sparse labeled data in the loss function are investigated.
RLBoost: Boosting Supervised Models using Deep Reinforcement Learning
Batanero, Eloy Anguiano, Pascual, Ángela Fernández, Jiménez, Álvaro Barbero
Data quality or data evaluation is sometimes a task as important as collecting a large volume of data when it comes to generating accurate artificial intelligence models. In fact, being able to evaluate the data can lead to a larger database that is better suited to a particular problem because we have the ability to filter out data obtained automatically of dubious quality. In this paper we present RLBoost, an algorithm that uses deep reinforcement learning strategies to evaluate a particular dataset and obtain a model capable of estimating the quality of any new data in order to improve the final predictive quality of a supervised learning model. This solution has the advantage that of being agnostic regarding the supervised model used and, through multi-attention strategies, takes into account the data in its context and not only individually. The results of the article show that this model obtains better and more stable results than other state-of-the-art algorithms such as LOO, DataShapley or DVRL.
- Europe > Spain > Galicia > Madrid (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Application of supervised learning models in the Chinese futures market
Global financial systems have seen considerable growth in size, concentration, and complexity over the past few decades, the complexity of financial systems exceeds the modelling capabilities of traditional quantitative methods. In addition, some very useful data sets, such as satellite images, voice recordings or news sentiment, are beyond the reach of econometrics[2]. In recent years, many hedge funds have started experimenting with machine learning (ML) methods. ML is a subset of artificial intelligence, where machines are used to learn from previous experience[3]. Unlike traditional programming, where developers need to predict every potential condition to program, ML's solution can effectively tailor the output to the data.
- Asia > China > Shanghai > Shanghai (0.06)
- South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
- North America > United States (0.04)
- (2 more...)
- Banking & Finance > Trading (1.00)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.68)
Segmentation of Parotid Gland Tumors Using Multimodal MRI and Contrastive Learning
Xu, Zi'an, Dai, Yin, Liu, Fayu, Wu, Boyuan, Chen, Weibing, Shi, Lifu
Parotid gland tumor is a common type of head and neck tumor. Segmentation of the parotid glands and tumors by MR images is important for the treatment of parotid gland tumors. However, segmentation of the parotid glands is particularly challenging due to their variable shape and low contrast with surrounding structures. Recently deep learning has developed rapidly, which can handle complex problems. However, most of the current deep learning methods for processing medical images are still based on supervised learning. Compared with natural images, medical images are difficult to acquire and costly to label. Contrastive learning, as an unsupervised learning method, can more effectively utilize unlabeled medical images. In this paper, we used a Transformer-based contrastive learning method and innovatively trained the contrastive learning network with transfer learning. Then, the output model was transferred to the downstream parotid segmentation task, which improved the performance of the parotid segmentation model on the test set. The improved DSC was 89.60%, MPA was 99.36%, MIoU was 85.11%, and HD was 2.98. All four metrics showed significant improvement compared to the results of using a supervised learning model as a pre-trained model for the parotid segmentation network. In addition, we found that the improvement of the segmentation network by the contrastive learning model was mainly in the encoder part, so this paper also tried to build a contrastive learning network for the decoder part and discussed the problems encountered in the process of building.
- North America > United States > New York (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Education > Educational Setting > Higher Education (0.93)
PiRL: Participant-Invariant Representation Learning for Healthcare
Cao, Zhaoyang, Yu, Han, Yang, Huiyuan, Sano, Akane
Due to individual heterogeneity, performance gaps are observed between generic (one-size-fits-all) models and person-specific models in data-driven health applications. However, in real-world applications, generic models are usually more favorable due to new-user-adaptation issues and system complexities, etc. To improve the performance of the generic model, we propose a representation learning framework that learns participant-invariant representations, named PiRL. The proposed framework utilizes maximum mean discrepancy (MMD) loss and domain-adversarial training to encourage the model to learn participant-invariant representations. Further, a triplet loss, which constrains the model for inter-class alignment of the representations, is utilized to optimize the learned representations for downstream health applications. We evaluated our frameworks on two public datasets related to physical and mental health, for detecting sleep apnea and stress, respectively. As preliminary results, we found the proposed approach shows around a 5% increase in accuracy compared to the baseline.
- North America > United States > Texas > Harris County > Houston (0.04)
- Asia > India (0.04)
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.47)