Goto

Collaborating Authors

 Energy


Unifying Mixture of Experts and Multi-Head Latent Attention for Efficient Language Models

arXiv.org Artificial Intelligence

We present MoE-MLA-RoPE, a novel architecture combination that combines Mixture of Experts (MoE) with Multi-head Latent Attention (MLA) and Rotary Position Embeddings (RoPE) for efficient language modeling. Our approach addresses the fundamental trade-off between model capacity and computational efficiency through three key innovations: (1) fine-grained expert routing with 64 micro-experts and top-$k$ selection, enabling flexible specialization through 3.6 * 10^7 possible expert combinations; (2) shared expert isolation that dedicates 2 always active experts for common patterns while routing to 6 of 62 specialized experts; and (3) gradient-conflict-free load balancing that maintains expert utilization without interfering with primary loss optimization. Extensive experiments on models ranging from 17M to 202M parameters demonstrate that MoE-MLA-RoPE with compression ratio r=d/2 achieves 68% KV cache memory reduction and 3.2x inference speedup while maintaining competitive perplexity (0.8% degradation). Compared to the parameters with 53.9M parameters, MoE-MLA-RoPE improves the validation loss by 6.9% over the vanilla transformers while using 42% fewer active parameters per forward pass. FLOP-matched experiments reveal even larger gains: 11.1% improvement with 3.2x inference acceleration. Automated evaluation using GPT-4 as a judge confirms quality improvements in generation, with higher scores on coherence (8.1/10), creativity (7.9/10) and grammatical correctness (8.2/10). Our results establish that architectural novelty, not parameter scaling, defines the efficiency frontier for resource-constrained language model deployment.


Deep Learning for Pavement Condition Evaluation Using Satellite Imagery

arXiv.org Artificial Intelligence

Civil infrastructure systems covers large land areas and needs frequent inspections to maintain their public service capabilities. The conventional approaches of manual surveys or vehicle-based automated surveys to assess infrastructure conditions are often labor-intensive and time-consuming. For this reason, it is worthwhile to explore more cost-effective methods for monitoring and maintaining these infrastructures. Fortunately, recent advancements in satellite systems and image processing algorithms have opened up new possibilities. Numerous satellite systems have been employed to monitor infrastructure conditions and identify damages. Due to the improvement in ground sample distance (GSD), the level of detail that can be captured has significantly increased. Taking advantage of these technology advancement, this research investigated to evaluate pavement conditions using deep learning models for analyzing satellite images. We gathered over 3,000 satellite images of pavement sections, together with pavement evaluation ratings from TxDOT's PMIS database. The results of our study show an accuracy rate is exceeding 90%. This research paves the way for a rapid and cost-effective approach to evaluating the pavement network in the future.


Adaptive Content Restriction for Large Language Models via Suffix Optimization

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated significant success across diverse applications. However, enforcing content restrictions remains a significant challenge due to their expansive output space. One aspect of content restriction is preventing LLMs from generating harmful content via model alignment approaches such as supervised fine-tuning (SFT). Yet, the need for content restriction may vary significantly across user groups, change rapidly over time, and not always align with general definitions of harmfulness. Applying SFT to each of these specific use cases is impractical due to the high computational, data, and storage demands. Motivated by this need, we propose a new task called \textit{Adaptive Content Restriction} (AdaCoRe), which focuses on lightweight strategies -- methods without model fine-tuning -- to prevent deployed LLMs from generating restricted terms for specific use cases. We propose the first method for AdaCoRe, named \textit{Suffix Optimization (SOP)}, which appends a short, optimized suffix to any prompt to a) prevent a target LLM from generating a set of restricted terms, while b) preserving the output quality. To evaluate AdaCoRe approaches, including our SOP, we create a new \textit{Content Restriction Benchmark} (CoReBench), which contains 400 prompts for 80 restricted terms across 8 carefully selected categories. We demonstrate the effectiveness of SOP on CoReBench, which outperforms the system-level baselines such as system suffix by 15\%, 17\%, 10\%, 9\%, and 6\% on average restriction rates for Gemma2-2B, Mistral-7B, Vicuna-7B, Llama3-8B, and Llama3.1-8B, respectively. We also demonstrate that SOP is effective on POE, an online platform hosting various commercial LLMs, highlighting its practicality in real-world scenarios.


Compression-Induced Communication-Efficient Large Model Training and Inferencing

arXiv.org Artificial Intelligence

Energy efficiency of training and inferencing with large neural network models is a critical challenge facing the future of sustainable large-scale machine learning workloads. This paper introduces an alternative strategy, called phantom parallelism, to minimize the net energy consumption of traditional tensor (model) parallelism, the most energy-inefficient component of large neural network training. The approach is presented in the context of feed-forward network architectures as a preliminary, but comprehensive, proof-of-principle study of the proposed methodology. We derive new forward and backward propagation operators for phantom parallelism, implement them as custom autograd operations within an end-to-end phantom parallel training pipeline and compare its parallel performance and energy-efficiency against those of conventional tensor parallel training pipelines. Formal analyses that predict lower bandwidth and FLOP counts are presented with supporting empirical results on up to 256 GPUs that corroborate these gains. Experiments are shown to deliver ~50% reduction in the energy consumed to train FFNs using the proposed phantom parallel approach when compared with conventional tensor parallel methods. Additionally, the proposed approach is shown to train smaller phantom models to the same model loss on smaller GPU counts as larger tensor parallel models on larger GPU counts offering the possibility for even greater energy savings.


Reproducibility of Machine Learning-Based Fault Detection and Diagnosis for HVAC Systems in Buildings: An Empirical Study

arXiv.org Artificial Intelligence

Reproducibility is a cornerstone of scientific research, enabling independent verification and validation of empirical findings. The topic gained prominence in fields such as psychology and medicine, where concerns about non - replicable results sparked ongoing discussions about research practices. In recent years, the fast-growing field of Machine Learning (ML) has become part of this discourse, as it faces similar concerns about transparency and reliability. Some reproducibility issues in ML research are shared with other fields, such as limited access to data and missing methodological details. In addition, ML introduces specific challenges, including inherent nondeterminism and computational constraints. While reproducibility issues are increasingly recognized by the ML community and its major conferences, less is known about how these challenges manifest in applied disciplines. This paper contributes to closing this gap by analyzing the transparency and reproducibility standards of ML applications in building energy systems. The results indicate that nearly all articles are not reproducible due to insufficient disclosure across key dimensions of reproducibility. 72% of the articles do not specify whether the dataset used is public, proprietary, or commercially available. Only two papers share a link to their code - one of which was broken. Two-thirds of the publications were authored exclusively by academic researchers, yet no significant differences in reproducibility were observed compared to publications with industry-affiliated authors. These findings highlight the need for targeted interventions, including reproducibility guidelines, training for researchers, and policies by journals and conferences that promote transparency and reproducibility.


GNN-ASE: Graph-Based Anomaly Detection and Severity Estimation in Three-Phase Induction Machines

arXiv.org Artificial Intelligence

The diagnosis of induction machines has traditionally relied on model-based methods that require the development of complex dynamic models, making them difficult to implement and computationally expensive. To overcome these limitations, this paper proposes a model-free approach using Graph Neural Networks (GNNs) for fault diagnosis in induction machines. The focus is on detecting multiple fault types -- including eccentricity, bearing defects, and broken rotor bars -- under varying severity levels and load conditions. Unlike traditional approaches, raw current and vibration signals are used as direct inputs, eliminating the need for signal preprocessing or manual feature extraction. The proposed GNN-ASE model automatically learns and extracts relevant features from raw inputs, leveraging the graph structure to capture complex relationships between signal types and fault patterns. It is evaluated for both individual fault detection and multi-class classification of combined fault conditions. Experimental results demonstrate the effectiveness of the proposed model, achieving 92.5\% accuracy for eccentricity defects, 91.2\% for bearing faults, and 93.1\% for broken rotor bar detection. These findings highlight the model's robustness and generalization capability across different operational scenarios. The proposed GNN-based framework offers a lightweight yet powerful solution that simplifies implementation while maintaining high diagnostic performance. It stands as a promising alternative to conventional model-based diagnostic techniques for real-world induction machine monitoring and predictive maintenance.


A Data-Driven Machine Learning Approach for Predicting Axial Load Capacity in Steel Storage Rack Columns

arXiv.org Artificial Intelligence

In this study, we present a machine learning (ML) framework to predict the axial load-bearing capacity, (kN), of cold-formed steel structural members. The methodology emphasizes robust model selection and interpretability, addressing the limitations of traditional analytical approaches in capturing the nonlinearities and geometrical complexities inherent to buckling behavior. The dataset, comprising key geometric and mechanical parameters of steel columns, was curated with appropriate pre-processing steps including removal of non-informative identifiers and imputation of missing values. A comprehensive suite of regression algorithms, ranging from linear models to kernel-based regressors and ensemble tree methods was evaluated. Among these, Gradient Boosting Regression exhibited superior predictive performance across multiple metrics, including the coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE), and was consequently selected as the final model. Model interpretability was addressed using SHapley Additive exPlanations (SHAP), enabling insight into the relative importance and interaction of input features influencing the predicted axial capacity. To facilitate practical deployment, the model was integrated into an interactive, Python-based web interface via Streamlit. This tool allows end-users-such as structural engineers and designers, to input design parameters manually or through CSV upload, and to obtain real-time predictions of axial load capacity without the need for programming expertise. Applied to the context of steel storage rack columns, the framework demonstrates how data-driven tools can enhance design safety, streamline validation workflows, and inform decision-making in structural applications where buckling is a critical failure mode


Deploying Geospatial Foundation Models in the Real World: Lessons from WorldCereal

arXiv.org Artificial Intelligence

The increasing availability of geospatial foundation models has the potential to transform remote sensing applications such as land cover classification, environmental monitoring, and change detection. Despite promising benchmark results, the deployment of these models in operational settings is challenging and rare. Standardized evaluation tasks often fail to capture real-world complexities relevant for end-user adoption such as data heterogeneity, resource constraints, and application-specific requirements. This paper presents a structured approach to integrate geospatial foundation models into operational mapping systems. Our protocol has three key steps: defining application requirements, adapting the model to domain-specific data and conducting rigorous empirical testing. Using the Presto model in a case study for crop mapping, we demonstrate that fine-tuning a pre-trained model significantly improves performance over conventional supervised methods. Our results highlight the model's strong spatial and temporal generalization capabilities. Our protocol provides a replicable blueprint for practitioners and lays the groundwork for future research to operationalize foundation models in diverse remote sensing applications. Application of the protocol to the WorldCereal global crop-mapping system showcases the framework's scalability.


Prompting Large Language Models for Training-Free Non-Intrusive Load Monitoring

arXiv.org Artificial Intelligence

Non-intrusive load monitoring (NILM) aims to disaggregate total electricity consumption into individual appliance usage, thus enabling more effective energy management. While deep learning has advanced NILM, it remains limited by its dependence on labeled data, restricted generalization, and lack of explainability. This paper introduces the first prompt-based NILM framework that leverages large language models (LLMs) with in-context learning. We design and evaluate prompt strategies that integrate appliance features, contextual information, and representative time-series examples through extensive case studies. Extensive experiments on the REDD and UK-DALE datasets show that LLMs guided solely by prompts deliver only basic NILM capabilities, with performance that lags behind traditional deep-learning models in complex scenarios. However, the experiments also demonstrate strong generalization across different houses and even regions by simply adapting the injected appliance features. It also provides clear, human-readable explanations for the inferred appliance states. Our findings define the capability boundaries of using prompt-only LLMs for NILM tasks. Their strengths in generalization and explainability present a promising new direction for the field.


Between Linear and Sinusoidal: Rethinking the Time Encoder in Dynamic Graph Learning

arXiv.org Artificial Intelligence

Dynamic graph learning is essential for applications involving temporal networks and requires effective modeling of temporal relationships. Seminal attention-based models like TGAT and DyGFormer rely on sinusoidal time encoders to capture temporal dependencies between edge events. Prior work justified sinusoidal encodings because their inner products depend on the time spans between events, which are crucial features for modeling inter-event relations. However, sinusoidal encodings inherently lose temporal information due to their many-to-one nature and therefore require high dimensions. In this paper, we rigorously study a simpler alternative: the linear time encoder, which avoids temporal information loss caused by sinusoidal functions and reduces the need for high-dimensional time encoders. We show that the self-attention mechanism can effectively learn to compute time spans between events from linear time encodings and extract relevant temporal patterns. Through extensive experiments on six dynamic graph datasets, we demonstrate that the linear time encoder improves the performance of TGAT and DyGFormer in most cases. Moreover, the linear time encoder can lead to significant savings in model parameters with minimal performance loss. For example, compared to a 100-dimensional sinusoidal time encoder, TGAT with a 2-dimensional linear time encoder saves 43% of parameters and achieves higher average precision on five datasets. While both encoders can be used simultaneously, our study highlights the often-overlooked advantages of linear time features in modern dynamic graph models. These findings can positively impact the design choices of various dynamic graph learning architectures and eventually benefit temporal network applications such as recommender systems, communication networks, and traffic forecasting.