Goto

Collaborating Authors

 prediction probability







Gaussian Process Regression for Active Sensing Probabilistic Structural Health Monitoring: Experimental Assessment Across Multiple Damage and Loading Scenarios

arXiv.org Machine Learning

In the near future, Structural Health Monitoring (SHM) technologies will be capable of overcoming the drawbacks in the current maintenance and life-cycle management paradigms, namely: cost, increased downtime, less-than-optimal safety management paradigm and the limited applicability of fully-autonomous operations. In the context of SHM, one of the most challenging tasks is damage quantification. Current methods face accuracy and/or robustness issues when it comes to varying operating and environmental conditions. In addition, the damage/no-damage paradigm of current frameworks does not offer much information to maintainers on the ground for proper decision-making. In this study, a novel structural damage quantification framework is proposed based on widely-used Damage Indices (DIs) and Gaussian Process Regression Models (GPRMs). The novelty lies in calculating the probability of an incoming test DI point originating from a specific state, which allows for probability-educated decision-making. This framework is applied to three test cases: a Carbon Fiber-Reinforced Plastic (CFRP) coupon with attached weights as simulated damage, an aluminum coupon with a notch, and an aluminum coupon with attached weights as simulated damage under varying loading states. The state prediction method presented herein is applied to single-state quantification in the first two test cases, as well as the third one assuming the loading state is known. Finally, the proposed method is applied to the third test case assuming neither the damage size nor the load is known in order to predict both simultaneously from incoming DI test points. In applying this framework, two forms of GPRMs (standard and variational heteroscedastic) are used in order to critically assess their performances with respect to the three test cases.


On the calibration of Just-in-time Defect Prediction

arXiv.org Artificial Intelligence

--Just-in-time defect prediction (JIT DP) leverages machine learning to identify defect-prone code commits, enabling quality assurance (QA) teams to allocate resources more efficiently by focusing on commits that are most likely to contain defects. Although JIT defect prediction techniques have introduced notable improvements in terms of predictive accuracy, they are still susceptible to misclassification errors such as false positives and false negatives. This can lead to wasted resources or undetected defects, a particularly critical concern when QA resources are limited. T o mitigate these challenges and preserve the practical utility of JIT defect prediction tools, it becomes essential to estimate the reliability of the predictions, i.e., computing confidence scores. Such scores can help practitioners determine the trustworthiness of predictions and and thus prioritize them efficiently. A simple approach to computing confidence scores is to extract, alongside each prediction, the corresponding prediction probabilities and use them as indicators of confidence. However, for these probabilities to reliably serve as confidence scores, the predictive model must be well-calibrated. This means that the prediction probabilities must accurately represent the true likelihood of each prediction being correct. Miscalibration, common in modern machine learning models, distorts probability scores such that the model's prediction probabilities do not align with the actual probability of those predictions being correct. Despite its importance, model calibration has been largely overlooked in JIT defect prediction. In this study, we evaluate the calibration of several state-of-the-art JIT defect prediction techniques to determine whether and to what extent they exhibit poor calibration. Furthermore, we assess whether post-calibration methods can improve the calibration of existing JIT defect prediction models. Our experimental analysis reveals that all evaluated JIT DP models exhibit some level of miscalibration, with Expected Calibration Error (ECE) ranging from 2% to 35%. Furthermore, post-calibration methods do not consistently improve the calibration of these JIT DP models. In recent years, just-in-time defect prediction (JIT DP) has emerged as a valuable machine learning (ML)-based technique, designed to predict whether a code commit is defect-prone or clean. By identifying code commits that are more likely to contain defects, JIT defect prediction helps quality assurance (QA) practitioners decide whether to perform targeted inspections and code reviews, as well as where and how to allocate testing efforts and resources [3], [4]. By supporting the prioritization of the code commits for further investigation and testing, JIT defect prediction models enable the timely identification of defects in the codebase. JIT defect prediction thus provides a means to optimize QA workflows.


Long-Tail Crisis in Nearest Neighbor Language Models

arXiv.org Artificial Intelligence

The $k$-nearest-neighbor language model ($k$NN-LM), one of the retrieval-augmented language models, improves the perplexity for given text by directly accessing a large datastore built from any text data during inference. A widely held hypothesis for the success of $k$NN-LM is that its explicit memory, i.e., the datastore, enhances predictions for long-tail phenomena. However, prior works have primarily shown its ability to retrieve long-tail contexts, leaving the model's performance remain underexplored in estimating the probabilities of long-tail target tokens during inference. In this paper, we investigate the behavior of $k$NN-LM on low-frequency tokens, examining prediction probability, retrieval accuracy, token distribution in the datastore, and approximation error of the product quantization. Our experimental results reveal that $k$NN-LM does not improve prediction performance for low-frequency tokens but mainly benefits high-frequency tokens regardless of long-tail contexts in the datastore.


GIN-Graph: A Generative Interpretation Network for Model-Level Explanation of Graph Neural Networks

arXiv.org Artificial Intelligence

One significant challenge of exploiting Graph neural networks (GNNs) in real-life scenarios is that they are always treated as black boxes, therefore leading to the requirement of interpretability. Model-level interpretations explain what patterns maximize probability of predicting to a certain class. However, existing model-level interpretation methods pose several limitations such as generating invalid explanation graphs and requiring extreme fine-tuning on hyperparameters manually. In this paper, we propose a new Generative Interpretation Network for Model-Level Explanation of Graph Neural Networks (GIN-Graph), to generate reliable model-level explanation graphs. The implicit and likelihood-free generative adversarial networks are exploited to construct explanation graphs similar to original graphs, meanwhile maximizing the prediction probability for a certain class by adopting a novel objective function. Experimental results indicate that GIN-Graph can be easily applied to GNN models trained on a variety of graph datasets to create meaningful explanation graphs without requiring extensive fine-tuning on hyperparameters.


Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution

arXiv.org Artificial Intelligence

Interpreting and controlling the internal mechanisms of large language models (LLMs) is crucial for improving their trustworthiness and utility. Recent efforts have primarily focused on identifying and manipulating neurons by establishing discrete mappings between neurons and semantic concepts. However, such mappings struggle to handle the inherent polysemanticity in LLMs, where individual neurons encode multiple, distinct concepts. This makes precise control challenging and complicates downstream interventions. Through an in-depth analysis of both encoder and decoder-based LLMs across multiple text classification datasets, we uncover that while individual neurons encode multiple concepts, their activation magnitudes vary across concepts in distinct, Gaussian-like patterns. Building on this insight, we introduce NeuronLens, a novel range-based interpretation and manipulation framework that provides a finer view of neuron activation distributions to localize concept attribution within a neuron. Extensive empirical evaluations demonstrate that NeuronLens significantly reduces unintended interference, while maintaining precise control for manipulation of targeted concepts, outperforming existing methods.