Goto

Collaborating Authors

 loading


An Interpretable and Stable Framework for Sparse Principal Component Analysis

Hu, Ying, Yang, Hu

arXiv.org Machine Learning

Sparse principal component analysis (SPCA) addresses the poor interpretability and variable redundancy often encountered by principal component analysis (PCA) in high-dimensional data. However, SPCA typically imposes uniform penalties on variables and does not account for differences in variable importance, which may lead to unstable performance in highly noisy or structurally complex settings. We propose SP-SPCA, a method that introduces a single equilibrium parameter into the regularization framework to adaptively adjust variable penalties. This modification of the L2 penalty provides flexible control over the trade-off between sparsity and explained variance while maintaining computational efficiency. Simulation studies show that the proposed method consistently outperforms standard sparse principal component methods in identifying sparse loading patterns, filtering noise variables, and preserving cumulative variance, especially in high-dimensional and noisy settings. Empirical applications to crime and financial market data further demonstrate its practical utility. In real data analyses, the method selects fewer but more relevant variables, thereby reducing model complexity while maintaining explanatory power. Overall, the proposed approach offers a robust and efficient alternative for sparse modeling in complex high-dimensional data, with clear advantages in stability, feature selection, and interpretability


Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

Murden, Raphiel J., Tian, Ganzhong, Qiu, Deqiang, Risk, Benajmin B.

arXiv.org Machine Learning

Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.


704cddc91e28d1a5517518b2f12bc321-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their feedback. We will first respond to shared and then to individual comments. Additionally, reviewers 2 and 3 requested clarification regarding the advantages of DCA over other methods. For instance, one could attempt to correlate each neuron's contribution to the DCA subspace with single-neuron Studying the behavior of Kernel DCA is a direction for future studies. Additionally, we found and corrected a minor bug in Figure 1A: the SFA and DCA lines are now blue and red, respectively.


An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis

Saquicela, Victor, Palacio-Baus, Kenneth, Chifla, Mario

arXiv.org Machine Learning

Abstract--Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA. We analyze multiple data sets with varying variability structures (USArrests, Iris, Decathlon2) to show that the correspondence between clusters of variables and principal components depends on the data's inherent structure.


Benchmarking Resilience and Sensitivity of Polyurethane-Based Vision-Based Tactile Sensors

Davis, Benjamin, Stuart, Hannah

arXiv.org Artificial Intelligence

Vision-based tactile sensors (VBTSs) are a promising technology for robots, providing them with dense signals that can be translated into an understanding of normal and shear load, contact region, texture classification, and more. However, existing VBTS tactile surfaces make use of silicone gels, which provide high sensitivity but easily deteriorate from loading and surface wear. We propose that polyurethane rubber, used for high-load applications like shoe soles, rubber wheels, and industrial gaskets, may provide improved physical gel resilience, potentially at the cost of sensitivity. To compare the resilience and sensitivity of silicone and polyurethane VBTS gels, we propose a series of standard evaluation benchmarking protocols. Our resilience tests assess sensor durability across normal loading, shear loading, and abrasion. For sensitivity, we introduce model-free assessments of force and spatial sensitivity to directly measure the physical capabilities of each gel without effects introduced from data and model quality. Finally, we include a bottle cap loosening and tightening demonstration as an example where polyurethane gels provide an advantage over their silicone counterparts.


In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading

Lin, Shuning, He, Yifan, Chen, Yitong

arXiv.org Artificial Intelligence

In today's landscape, Mixture of Experts (MoE) is a crucial architecture that has been used by many of the most advanced models. One of the major challenges of MoE models is that they usually require much more memory than their dense counterparts due to their unique architecture, and hence are harder to deploy in environments with limited GPU memory, such as edge devices. MoE offloading is a promising technique proposed to overcome this challenge, especially if it is enhanced with caching and pre-fetching, but prior work stopped at suboptimal caching algorithm and offered limited insights. In this work, we study MoE offloading in depth and make the following contributions: 1. We analyze the expert activation and LRU caching behavior in detail and provide traces. 2. We propose LFU caching optimization based on our analysis and obtain strong improvements from LRU. 3. We implement and experiment speculative expert pre-fetching, providing detailed trace showing its huge potential . 4. In addition, our study extensively covers the behavior of the MoE architecture itself, offering information on the characteristic of the gating network and experts. This can inspire future work on the interpretation of MoE models and the development of pruning techniques for MoE architecture with minimal performance loss.


SARIMAX-Based Power Outage Prediction During Extreme Weather Events

Ye, Haoran, Sun, Qiuzhuang, Yang, Yang

arXiv.org Artificial Intelligence

This study develops a SARIMAX-based prediction system for short-term power outage forecasting during extreme weather events. Using hourly data from Michigan counties with outage counts and comprehensive weather features, we implement a systematic two-stage feature engineering pipeline: data cleaning to remove zero-variance and unknown features, followed by correlation-based filtering to eliminate highly correlated predictors. The selected features are augmented with temporal embeddings, multi-scale lag features, and weather variables with their corresponding lags as exogenous inputs to the SARIMAX model. To address data irregularity and numerical instability, we apply standardization and implement a hierarchical fitting strategy with sequential optimization methods, automatic downgrading to ARIMA when convergence fails, and historical mean-based fallback predictions as a final safeguard. The model is optimized separately for short-term (24 hours) and medium-term (48 hours) forecast horizons using RMSE as the evaluation metric. Our approach achieves an RMSE of 177.2, representing an 8.4\% improvement over the baseline method (RMSE = 193.4), thereby validating the effectiveness of our feature engineering and robust optimization strategy for extreme weather-related outage prediction.



PreScope: Unleashing the Power of Prefetching for Resource-Constrained MoE Inference

Yu, Enda, Zhang, Zhaoning, Dong, Dezun, Wu, Yongwei, Liao, Xiangke

arXiv.org Artificial Intelligence

Mixture-of-Experts (MoE) models face memory and PCIe latency bottlenecks when deployed on commodity hardware. Offloading expert weights to CPU memory results in PCIe transfer latency that exceeds GPU computation by several folds. We present PreScope, a prediction-driven expert scheduling system that addresses three key challenges: inaccurate activation prediction, PCIe bandwidth competition, and cross-device scheduling complexity. Our solution includes: 1) Learnable Layer-Aware Predictor (LLaPor) that captures layer-specific expert activation patterns; 2) Prefetch-Aware Cross-Layer Scheduling (PreSched) that generates globally optimal plans balancing prefetching costs and loading overhead; 3) Asynchronous I/O Optimizer (AsyncIO) that decouples I/O from computation, eliminating waiting bubbles. PreScope achieves 141% higher throughput and 74.6% lower latency than state-of-the-art solutions.


Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash

Jia, Fucheng, Wu, Zewen, Jiang, Shiqi, Jiang, Huiqiang, Zhang, Qianxi, Yang, Yuqing, Liu, Yunxin, Ren, Ju, Zhang, Deyu, Cao, Ting

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly being deployed on mobile devices, but the limited DRAM capacity constrains the deployable model size. This paper introduces ActiveFlow, the first LLM inference framework that can achieve adaptive DRAM usage for modern LLMs (not ReLU-based), enabling the scaling up of deployable model sizes. The framework is based on the novel concept of active weight DRAM-flash swapping and incorporates three novel techniques: (1) Cross-layer active weights preloading. It uses the activations from the current layer to predict the active weights of several subsequent layers, enabling computation and data loading to overlap, as well as facilitating large I/O transfers. (2) Sparsity-aware self-distillation. It adjusts the active weights to align with the dense-model output distribution, compensating for approximations introduced by contextual sparsity. (3) Active weight DRAM-flash swapping pipeline. It orchestrates the DRAM space allocation among the hot weight cache, preloaded active weights, and computation-involved weights based on available memory. Results show ActiveFlow achieves the performance-cost Pareto frontier compared to existing efficiency optimization methods.