stability measure
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany (0.04)
- Asia > China (0.04)
Quantifying consistency and accuracy of Latent Dirichlet Allocation
Magsarjav, Saranzaya, Humphries, Melissa, Tuke, Jonathan, Mitchell, Lewis
Topic modelling in Natural Language Processing uncovers hidden topics in large, unlabelled text datasets. It is widely applied in fields such as information retrieval, content summarisation, and trend analysis across various disciplines. However, probabilistic topic models can produce different results when rerun due to their stochastic nature, leading to inconsistencies in latent topics. Factors like corpus shuffling, rare text removal, and document elimination contribute to these variations. This instability affects replicability, reliability, and interpretation, raising concerns about whether topic models capture meaningful topics or just noise. To address these problems, we defined a new stability measure that incorporates accuracy and consistency and uses the generative properties of LDA to generate a new corpus with ground truth. These generated corpora are run through LDA 50 times to determine the variability in the output. We show that LDA can correctly determine the underlying number of topics in the documents. We also find that LDA is more internally consistent, as the multiple reruns return similar topics; however, these topics are not the true topics.
- Oceania > Australia > South Australia > Adelaide (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)
Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models
Dai, Runpeng, Yang, Run, Zhou, Fan, Zhu, Hongtu
Large Language Models (LLMs) and Vision-Language Models (VLMs) have become essential to general artificial intelligence, exhibiting remarkable capabilities in task understanding and problem-solving. However, the real-world reliability of these models critically depends on their stability, which remains an underexplored area. Despite their widespread use, rigorous studies examining the stability of LLMs under various perturbations are still lacking. In this paper, we address this gap by proposing a novel stability measure for LLMs, inspired by statistical methods rooted in information geometry. Our measure possesses desirable invariance properties, making it well-suited for analyzing model sensitivity to both parameter and input perturbations. To assess the effectiveness of our approach, we conduct extensive experiments on models ranging in size from 1.5B to 13B parameters. Our results demonstrate the utility of our measure in identifying salient parameters and detecting vulnerable regions in input images or critical dimensions in token embeddings. Furthermore, leveraging our stability framework, we enhance model robustness during model merging, leading to improved performance.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > New York > New York County > New York City (0.04)
On the Selection Stability of Stability Selection and Its Applications
Nouraie, Mahdi, Muller, Samuel
Stability selection is a widely adopted resampling-based framework for high-dimensional structure estimation and variable selection. However, the concept of 'stability' is often narrowly addressed, primarily through examining selection frequencies, or 'stability paths'. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection framework, moving beyond single-variable analysis. We suggest that the stability estimator offers two advantages: it can serve as a reference to reflect the robustness of the outcomes obtained and help identify an optimal regularization value to improve stability. By determining this value, we aim to calibrate key stability selection parameters, namely, the decision threshold and the expected number of falsely selected variables, within established theoretical bounds. Furthermore, we explore a novel selection criterion based on this regularization value. With the asymptotic distribution of the stability estimator previously established, convergence to true stability is ensured, allowing us to observe stability trends over successive sub-samples. This approach sheds light on the required number of sub-samples addressing a notable gap in prior studies. The 'stabplot' package is developed to facilitate the use of the plots featured in this manuscript, supporting their integration into further statistical analysis and research workflows.
Multi-Wheeled Passive Sliding with Fully-Actuated Aerial Robots: Tip-Over Recovery and Avoidance
Hui, Tong, Cuniato, Eugenio, Pantic, Michael, Ghielmini, Jefferson, Lanegger, Christian, Papageorgiou, Dimitrios, Tognon, Marco, Siegwart, Roland, Fumagalli, Matteo
Push-and-slide tasks carried out by fully-actuated aerial robots can be used for inspection and simple maintenance tasks at height, such as non-destructive testing and painting. Often, an end-effector based on multiple non-actuated contact wheels is used to contact the surface. This approach entails challenges in ensuring consistent wheel contact with a surface whose exact orientation and location might be uncertain due to sensor aliasing and drift. Using a standard full-pose controller dependent on the inaccurate surface position and orientation may cause wheels to lose contact during sliding, and subsequently lead to robot tip-over. To address the tip-over issue, we present two approaches: (1) tip-over avoidance guidelines for hardware design, and (2) control for tip-over recovery and avoidance. Physical experiments with a fully-actuated aerial vehicle were executed for a push-and-slide task on a flat surface. The resulting data is used in deriving tip-over avoidance guidelines and designing a simulator that closely captures real-world conditions. We then use the simulator to test the effectiveness and robustness of the proposed approaches in risky scenarios against uncertainties.
- Energy (0.67)
- Aerospace & Defense (0.46)
Stability-Based Model Selection
Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is cross- validation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order se- lection in comparison to standard techniques in this area.
Minimax Optimal Estimation of Stability Under Distribution Shift
Namkoong, Hongseok, Ma, Yuanzhe, Glynn, Peter W.
The performance of decision policies and prediction models often deteriorates when applied to environments different from the ones seen during training. To ensure reliable operation, we propose and analyze the stability of a system under distribution shift, which is defined as the smallest change in the underlying environment that causes the system's performance to deteriorate beyond a permissible threshold. In contrast to standard tail risk measures and distributionally robust losses that require the specification of a plausible magnitude of distribution shift, the stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. We develop a minimax optimal estimator of stability and analyze its convergence rate, which exhibits a fundamental phase shift behavior. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost. Empirically, we demonstrate the practical utility of our stability framework by using it to compare system designs on problems where robustness to distribution shift is critical.
- North America > United States > New York (0.04)
- South America (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Providers & Services (1.00)
- Banking & Finance (0.92)
Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting on Data Sets with Similar Features
Bommert, Andrea, Rahnenführer, Jörg, Lang, Michel
Fitting models with high predictive accuracy that include all relevant but no irrelevant or redundant features is a challenging task on data sets with similar (e.g. highly correlated) features. We propose the approach of tuning the hyperparameters of a predictive model in a multi-criteria fashion with respect to predictive accuracy and feature selection stability. We evaluate this approach based on both simulated and real data sets and we compare it to the standard approach of single-criteria tuning of the hyperparameters as well as to the state-of-the-art technique "stability selection". We conclude that our approach achieves the same or better predictive performance compared to the two established approaches. Considering the stability during tuning does not decrease the predictive accuracy of the resulting models. Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features. The single-criteria approach fails at avoiding irrelevant or redundant features and the stability selection approach fails at selecting enough relevant features for achieving acceptable predictive accuracy. For our approach, for data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure, that is, a measure that considers similarities between features. For data sets with only few similar features, an unadjusted stability measure suffices and is faster to compute.
- Europe (0.28)
- Oceania > New Zealand (0.28)
Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features
Bommert, Andrea, Rahnenführer, Jörg
For data sets with similar features, for example highly correlated features, most existing stability measures behave in an undesired way: They consider features that are almost identical but have different identifiers as different features. Existing adjusted stability measures, that is, stability measures that take into account the similarities between features, have major theoretical drawbacks. We introduce new adjusted stability measures that overcome these drawbacks. We compare them to each other and to existing stability measures based on both artificial and real sets of selected features. Based on the results, we suggest using one new stability measure that considers highly similar features as exchangeable.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)