Goto

Collaborating Authors

 Energy


Divide, Specialize, and Route: A New Approach to Efficient Ensemble Learning

arXiv.org Artificial Intelligence

Ensemble learning has proven effective in boosting predictive performance, but traditional methods such as bagging, boosting, and dynamic ensemble selection (DES) suffer from high computational cost and limited adaptability to heterogeneous data distributions. To address these limitations, we propose Hellsemble, a novel and interpretable ensemble framework for binary classification that leverages dataset complexity during both training and inference. Hellsemble incrementally partitions the dataset into circles of difficulty by iteratively passing misclassified instances from simpler models to subsequent ones, forming a committee of specialised base learners. Each model is trained on increasingly challenging subsets, while a separate router model learns to assign new instances to the most suitable base model based on inferred difficulty. Hellsemble achieves strong classification accuracy while maintaining computational efficiency and interpretability. Experimental results on OpenML-CC18 and Tabzilla benchmarks demonstrate that Hellsemble often outperforms classical ensemble methods. Our findings suggest that embracing instance-level difficulty offers a promising direction for constructing efficient and robust ensemble systems.


Distributed Cross-Channel Hierarchical Aggregation for Foundation Models

arXiv.org Artificial Intelligence

Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated with tensor parallelism and model sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD GPUs on the Frontier Supercomputer.


Model-Based Real-Time Pose and Sag Estimation of Overhead Power Lines Using LiDAR for Drone Inspection

arXiv.org Artificial Intelligence

Drones can inspect overhead power lines while they remain energized, significantly simplifying the inspection process. However, localizing a drone relative to all conductors using an onboard LiDAR sensor presents several challenges: (1) conductors provide minimal surface for LiDAR beams limiting the number of conductor points in a scan, (2) not all conductors are consistently detected, and (3) distinguishing LiDAR points corresponding to conductors from other objects, such as trees and pylons, is difficult. This paper proposes an estimation approach that minimizes the error between LiDAR measurements and a single geometric model representing the entire conductor array, rather than tracking individual conductors separately. Experimental results, using data from a power line drone inspection, demonstrate that this method achieves accurate tracking, with a solver converging under 50 ms per frame, even in the presence of partial observations, noise, and outliers. A sensitivity analysis shows that the estimation approach can tolerate up to twice as many outlier points as valid conductors measurements.


mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

arXiv.org Artificial Intelligence

Multivariate time series anomaly detection (MTS-AD) is critical in domains like healthcare, cybersecurity, and industrial monitoring, yet remains challenging due to complex inter-variable dependencies, temporal dynamics, and sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for MTS-AD and unsupervised model selection, spanning 344 labeled time series across 19 datasets and 12 diverse application domains. mTSBench evaluates 24 anomaly detection methods, including large language model (LLM)-based detectors for multivariate time series, and systematically benchmarks unsupervised model selection techniques under standardized conditions. Consistent with prior findings, our results confirm that no single detector excels across datasets, underscoring the importance of model selection. However, even state-of-the-art selection methods remain far from optimal, revealing critical gaps. mTSBench provides a unified evaluation suite to enable rigorous, reproducible comparisons and catalyze future advances in adaptive anomaly detection and robust model selection.


Latent-space Field Tension for Astrophysical Component Detection An application to X-ray imaging

arXiv.org Machine Learning

Modern observatories are designed to deliver increasingly detailed views of astrophysical signals. To fully realize the potential of these observations, principled data-analysis methods are required to effectively separate and reconstruct the underlying astrophysical components from data corrupted by noise and instrumental effects. In this work, we introduce a novel multi-frequency Bayesian model of the sky emission field that leverages latent-space tension as an indicator of model misspecification, enabling automated separation of diffuse, point-like, and extended astrophysical emission components across wavelength bands. Deviations from latent-space prior expectations are used as diagnostics for model misspecification, thus systematically guiding the introduction of new sky components, such as point-like and extended sources. We demonstrate the effectiveness of this method on synthetic multi-frequency imaging data and apply it to observational X-ray data from the eROSITA Early Data Release (EDR) of the SN1987A region in the Large Magellanic Cloud (LMC). Our results highlight the method's capability to reconstruct astrophysical components with high accuracy, achieving sub-pixel localization of point sources, robust separation of extended emission, and detailed uncertainty quantification. The developed methodology offers a general and well-founded framework applicable to a wide variety of astronomical datasets, and is therefore well suited to support the analysis needs of next-generation multi-wavelength and multi-messenger surveys.


Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional

arXiv.org Artificial Intelligence

Transition path sampling (TPS), which involves finding probable paths connecting two points on an energy landscape, remains a challenge due to the complexity of real-world atomistic systems. Current machine learning approaches use expensive, task-specific, and data-free training procedures, limiting their ability to benefit from high-quality datasets and large-scale pre-trained models. In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of pre-trained generative models, specifically denoising diffusion and flow matching. Under these dynamics, finding high-likelihood transition paths becomes equivalent to minimizing the Onsager-Machlup (OM) action functional. This enables us to repurpose pre-trained generative models for TPS in a zero-shot manner, in contrast with bespoke, task-specific approaches in previous work. We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways and generalizing beyond the pre-trained model's original training dataset. Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale and improve with increased data availability. Code is available at github.com/ASK-Berkeley/OM-TPS.


Adaptive Anomaly Detection for Identifying Attacks in Cyber-Physical Systems: A Systematic Literature Review

arXiv.org Artificial Intelligence

Modern cyberattacks in cyber-physical systems (CPS) rapidly evolve and cannot be deterred effectively with most current methods which focused on characterizing past threats. Adaptive anomaly detection (AAD) is among the most promising techniques to detect evolving cyberattacks focused on fast data processing and model adaptation. AAD has been researched in the literature extensively; however, to the best of our knowledge, our work is the first systematic literature review (SLR) on the current research within this field. We present a comprehensive SLR, gathering 397 relevant papers and systematically analyzing 65 of them (47 research and 18 survey papers) on AAD in CPS studies from 2013 to 2023 (November). We introduce a novel taxonomy considering attack types, CPS application, learning paradigm, data management, and algorithms. Our analysis indicates, among other findings, that reviewed works focused on a single aspect of adaptation (either data processing or model adaptation) but rarely in both at the same time. We aim to help researchers to advance the state of the art and help practitioners to become familiar with recent progress in this field. We identify the limitations of the state of the art and provide recommendations for future research directions.


Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts

arXiv.org Artificial Intelligence

Mixture-of-Experts (MoE) architectures have emerged as a key strategy for scaling large language models (LLMs) efficiently. However, current MoE systems suffer from severe load imbalance, where only a small subset of experts is consistently activated during training and inference, leading to significant underutilization of model capacity and computational resources. In this work, we revisit expert routing through a clustering perspective and propose Latent Prototype Routing (LPR), a novel routing framework that generalizes existing approaches while promoting balanced expert utilization without compromising downstream performance. Extensive experiments across multiple open-source MoE models -- including DeepSeek-V3, Qwen3-MoE, and Mixtral -- demonstrate that LPR reduces the Gini coefficient of expert load from 0.70 to 0.035 on average, improves the min-max expert load ratio from 1e-6 to 0.70, achieving near-perfect load balancing.


Spiking Neural Networks for SAR Interferometric Phase Unwrapping: A Theoretical Framework for Energy-Efficient Processing

arXiv.org Artificial Intelligence

We present the first theoretical framework for applying spiking neural networks (SNNs) to synthetic aperture radar (SAR) interferometric phase unwrapping. Despite extensive research in both domains, our comprehensive literature review confirms that SNNs have never been applied to phase unwrapping, representing a significant gap in current methodologies. As Earth observation data volumes continue to grow exponentially (with missions like NISAR expected to generate 100PB in two years) energy-efficient processing becomes critical for sustainable data center operations. SNNs, with their event-driven computation model, offer potential energy savings of 30-100x compared to conventional approaches while maintaining comparable accuracy. We develop spike encoding schemes specifically designed for wrapped phase data, propose SNN architectures that leverage the spatial propagation nature of phase unwrapping, and provide theoretical analysis of computational complexity and convergence properties. Our framework demonstrates how the temporal dynamics inherent in SNNs can naturally model the spatial continuity constraints fundamental to phase unwrapping. This work opens a new research direction at the intersection of neuromorphic computing and SAR interferometry, offering a complementary approach to existing algorithms that could enable more sustainable large-scale InSAR processing.


NFISiS: New Perspectives on Fuzzy Inference Systems for Renewable Energy Forecasting

arXiv.org Artificial Intelligence

Deep learning models, despite their popularity, face challenges such as long training times and a lack of interpretability. In contrast, fuzzy inference systems offer a balance of accuracy and transparency. This paper addresses the limitations of traditional Takagi-Sugeno-Kang fuzzy models by extending the recently proposed New Takagi-Sugeno-Kang model to a new Mamdani-based regressor. These models are data-driven, allowing users to define the number of rules to balance accuracy and interpretability. To handle the complexity of large datasets, this research integrates wrapper and ensemble techniques. A Genetic Algorithm is used as a wrapper for feature selection, creating genetic versions of the models. Furthermore, ensemble models, including the Random New Mamdani Regressor, Random New Takagi-Sugeno-Kang, and Random Forest New Takagi-Sugeno-Kang, are introduced to improve robustness. The proposed models are validated on photovoltaic energy forecasting datasets, a critical application due to the intermittent nature of solar power. Results demonstrate that the genetic and ensemble fuzzy models, particularly the Genetic New Takagi-Sugeno-Kang and Random Forest New Takagi-Sugeno-Kang, achieve superior performance. They often outperform both traditional machine learning and deep learning models while providing a simpler and more interpretable rule-based structure. The models are available online in a library called nfisis (https://pypi.org/project/nfisis/).