prediction value
Explainable AI in Deep Learning-Based Prediction of Solar Storms
Rawashdeh, Adam O., Wang, Jason T. L., Herbert, Katherine G.
A deep learning model is often considered a black-box model, as its internal workings tend to be opaque to the user. Because of the lack of transparency, it is challenging to understand the reasoning behind the model's predictions. Here, we present an approach to making a deep learning-based solar storm prediction model interpretable, where solar storms include solar flares and coronal mass ejections (CMEs). This deep learning model, built based on a long short-term memory (LSTM) network with an attention mechanism, aims to predict whether an active region (AR) on the Sun's surface that produces a flare within 24 hours will also produce a CME associated with the flare. The crux of our approach is to model data samples in an AR as time series and use the LSTM network to capture the temporal dynamics of the data samples. To make the model's predictions accountable and reliable, we leverage post hoc model-agnostic techniques, which help elucidate the factors contributing to the predicted output for an input sequence and provide insights into the model's behavior across multiple sequences within an AR. To our knowledge, this is the first time that interpretability has been added to an LSTM-based solar storm prediction model.
Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features
Yoneda, Shunsuke, Švábenský, Valdemar, Li, Gen, Deguchi, Daisuke, Shimada, Atsushi
Digital textbooks are widely used in various educational contexts, such as university courses and online lectures. Such textbooks yield learning log data that have been used in numerous educational data mining (EDM) studies for student behavior analysis and performance prediction. However, these studies have faced challenges in integrating confidential data, such as academic records and learning logs, across schools due to privacy concerns. Consequently, analyses are often conducted with data limited to a single school, which makes developing high-performing and generalizable models difficult. This study proposes a method that combines federated learning and differential features to address these issues. Federated learning enables model training without centralizing data, thereby preserving student privacy. Differential features, which utilize relative values instead of absolute values, enhance model performance and generalizability. To evaluate the proposed method, a model for predicting at-risk students was trained using data from 1,136 students across 12 courses conducted over 4 years, and validated on hold-out test data from 5 other courses. Experimental results demonstrated that the proposed method addresses privacy concerns while achieving performance comparable to that of models trained via centralized learning in terms of Top-n precision, nDCG, and PR-AUC. Furthermore, using differential features improved prediction performance across all evaluation datasets compared to non-differential approaches. The trained models were also applicable for early prediction, achieving high performance in detecting at-risk students in earlier stages of the semester within the validation datasets.
An Interpretable Machine Learning Approach to Understanding the Relationships between Solar Flares and Source Active Regions
Cavus, Huseyin, Wang, Jason T. L., Singampalli, Teja P. S., Coban, Gani Caglar, Zhang, Hongyang, Raheem, Abd-ur, Wang, Haimin
Solar flares are defined as outbursts on the surface of the Sun. They occur when energy accumulated in magnetic fields enclosing solar active regions (ARs) is abruptly expelled. Solar flares and associated coronal mass ejections are sources of space weather that adversely impact devices at or near Earth, including the obstruction of high-frequency radio waves utilized for communication and the deterioration of power grid operations. Tracking and delivering early and precise predictions of solar flares is essential for readiness and catastrophe risk mitigation. This paper employs the random forest (RF) model to address the binary classification task, analyzing the links between solar flares and their originating ARs with observational data gathered from 2011 to 2021 by SolarMonitor.org and the XRT flare database. We seek to identify the physical features of a source AR that significantly influence its potential to trigger >=C-class flares. We found that the features of AR_Type_Today, Hale_Class_Yesterday are the most and the least prepotent features, respectively. NoS_Difference has a remarkable effect in decision-making in both global and local interpretations.
Peter Parker or Spiderman? Disambiguating Multiple Class Labels
Mummani, Nuthan, Ketha, Simran, Ramaswamy, Venkatakrishnan
In the supervised classification setting, during inference, deep networks typically make multiple predictions. For a pair of such predictions (that are in the top-k predictions), two distinct possibilities might occur. On the one hand, each of the two predictions might be primarily driven by two distinct sets of entities in the input. On the other hand, it is possible that there is a single entity or set of entities that is driving the prediction for both the classes in question. This latter case, in effect, corresponds to the network making two separate guesses about the identity of a single entity type. Clearly, both the guesses cannot be true, i.e. both the labels cannot be present in the input. Current techniques in interpretability research do not readily disambiguate these two cases, since they typically consider input attributions for one class label at a time. Here, we present a framework and method to do so, leveraging modern segmentation and input attribution techniques. Notably, our framework also provides a simple counterfactual "proof" of each case, which can be verified for the input on the model (i.e. without running the method again). We demonstrate that the method performs well for a number of samples from the ImageNet validation set and on multiple models.
Unveiling Transformer Perception by Exploring Input Manifolds
Benfenati, Alessandro, Ferrara, Alfio, Marta, Alessio, Riva, Davide, Rocchetti, Elisabetta
This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. We illustrate how this method can be used as a powerful tool for investigating how a Transformer sees the input space, facilitating local and task-agnostic explainability in Computer Vision and Natural Language Processing tasks.
Latent SHAP: Toward Practical Human-Interpretable Explanations
Bitton, Ron, Malach, Alon, Meiseles, Amiel, Momiyama, Satoru, Araki, Toshinori, Furukawa, Jun, Elovici, Yuval, Shabtai, Asaf
Model agnostic feature attribution algorithms (such as SHAP and LIME) are ubiquitous techniques for explaining the decisions of complex classification models, such as deep neural networks. However, since complex classification models produce superior performance when trained on low-level (or encoded) features, in many cases, the explanations generated by these algorithms are neither interpretable nor usable by humans. Methods proposed in recent studies that support the generation of human-interpretable explanations are impractical, because they require a fully invertible transformation function that maps the model's input features to the human-interpretable features. In this work, we introduce Latent SHAP, a black-box feature attribution framework that provides human-interpretable explanations, without the requirement for a fully invertible transformation function. We demonstrate Latent SHAP's effectiveness using (1) a controlled experiment where invertible transformation functions are available, which enables robust quantitative evaluation of our method, and (2) celebrity attractiveness classification (using the CelebA dataset) where invertible transformation functions are not available, which enables thorough qualitative evaluation of our method.
Adaptive Neural Network Ensemble Using Frequency Distribution
Neural network (NN) ensembles can reduce large prediction variance of NN and improve prediction accuracy. For highly nonlinear problems with insufficient data set, the prediction accuracy of NN models becomes unstable, resulting in a decrease in the accuracy of ensembles. Therefore, this study proposes a frequency distribution-based ensemble that identifies core prediction values, which are expected to be concentrated near the true prediction value. The frequency distribution-based ensemble classifies core prediction values supported by multiple prediction values by conducting statistical analysis with a frequency distribution, which is based on various prediction values obtained from a given prediction point. The frequency distribution-based ensemble can improve predictive performance by excluding prediction values with low accuracy and coping with the uncertainty of the most frequent value. An adaptive sampling strategy that sequentially adds samples based on the core prediction variance calculated as the variance of the core prediction values is proposed to improve the predictive performance of the frequency distribution-based ensemble efficiently. Results of various case studies show that the prediction accuracy of the frequency distribution-based ensemble is higher than that of Kriging and other existing ensemble methods. In addition, the proposed adaptive sampling strategy effectively improves the predictive performance of the frequency distribution-based ensemble compared with the previously developed space-filling and prediction variance-based strategies.
Applying convolutional neural networks to extremely sparse image datasets using an image subdivision approach
Purpose: The aim of this work is to demonstrate that convolutional neural networks (CNN) can be applied to extremely sparse image libraries by subdivision of the original image datasets. Methods: Image datasets from a conventional digital camera was created and scanning electron microscopy (SEM) measurements were obtained from the literature. The image datasets were subdivided and CNN models were trained on parts of the subdivided datasets. Results: The CNN models were capable of analyzing extremely sparse image datasets by utilizing the proposed method of image subdivision. It was furthermore possible to provide a direct assessment of the various regions where a given API or appearance was predominant.
Boosting Machine Learning Models with Explainable AI (XAI)
With a typical machine learning model, the traditional correlation of feature importance analysis often has limited value. In a data scientist's toolkit, are there reliable, systematic, model agnostic methods that measure feature impact accurate to the prediction? As AI gains traction with more applications, Explainable AI (XAI) is an increasingly critical component to explain with clarity and deploy with confidence. XAI technologies are becoming more mature for both machine learning and deep learning. SHAP (SHapley Additive exPlanations) is developed by Scott Lundberg at the University of Washington.
A Loss-Function for Causal Machine-Learning
Causal machine-learning is about predicting the net-effect (true-lift) of treatments. Given the data of a treatment group and a control group, it is similar to a standard supervised-learning problem. Unfortunately, there is no similarly well-defined loss function due to the lack of point-wise true values in the data. Many advances in modern machine-learning are not directly applicable due to the absence of such loss function. We propose a novel method to define a loss function in this context, which is equal to mean-square-error (MSE) in a standard regression problem. Our loss function is universally applicable, thus providing a general standard to evaluate the quality of any model/strategy that predicts the true-lift. We demonstrate that despite its novel definition, one can still perform gradient descent directly on this loss function to find the best fit. This leads to a new way to train any parameter-based model, such as deep neural networks, to solve causal machine-learning problems without going through the meta-learner strategy.