Goto

Collaborating Authors

 Energy


DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators

arXiv.org Artificial Intelligence

In the hardware design space exploration process, it is critical to optimize both hardware parameters and algorithm-to-hardware mappings. Previous work has largely approached this simultaneous optimization problem by separately exploring the hardware design space and the mapspace - both individually large and highly nonconvex spaces - independently. The resulting combinatorial explosion has created significant difficulties for optimizers. In this paper, we introduce DOSA, which consists of differentiable performance models and a gradient descent-based optimization technique to simultaneously explore both spaces and identify high-performing design points. Experimental results demonstrate that DOSA outperforms random search and Bayesian optimization by 2.80x and 12.59x, respectively, in improving DNN model energy-delay product, given a similar number of samples. We also demonstrate the modularity and flexibility of DOSA by augmenting our analytical model with a learned model, allowing us to optimize buffer sizes and mappings of a real DNN accelerator and attain a 1.82x improvement in energy-delay product.


STL-Based Motion Planning and Uncertainty-Aware Risk Analysis for Human-Robot Collaboration with a Multi-Rotor Aerial Vehicle

arXiv.org Artificial Intelligence

This paper presents a novel approach to motion planning and risk analysis for enhancing human-robot collaboration using a Multi-Rotor Aerial Vehicle (MRAV). The proposed method uses Signal Temporal Logic (STL) to encode key mission objectives, such as safety, timing, and human preferences, with a strong focus on ergonomics and comfort. An optimization framework generates dynamically feasible trajectories while considering the MRAV's physical constraints. Given the nonlinear and non-convex nature of the problem, smooth approximations and gradient-based techniques assist in handling the problem's computational complexity. Additionally, an uncertainty-aware risk analysis is incorporated to assess potential deviations from the mission specifications, providing insights into the likelihood of mission success under uncertain conditions. Further, an event-triggered replanning strategy is implemented to respond to unforeseen events and external disturbances. The approach is validated through MATLAB and Gazebo simulations, using an object handover task in a mock-up environment inspired by power line maintenance scenarios. The results highlight the method's effectiveness in achieving safe, efficient, and resilient human-robot collaboration.


GTS_Forecaster: a novel deep learning based geodetic time series forecasting toolbox with python

arXiv.org Artificial Intelligence

Geodetic time series -- such as Global Navigation Satellite System (GNSS) positions, satellite altimetry-derived sea surface height (SSH), and tide gauge (TG) records -- is essential for monitoring surface deformation and sea level change. Accurate forecasts of these variables can enhance early warning systems and support hazard mitigation for earthquakes, landslides, coastal storm surge, and long-term sea level. However, the nonlinear, non-stationary, and incomplete nature of such variables presents significant challenges for classic models, which often fail to capture long-term dependencies and complex spatiotemporal dynamics. We introduce GTS Forecaster, an open-source Python package for geodetic time series forecasting. It integrates advanced deep learning models -- including kernel attention networks (KAN), graph neural network-based gated recurrent units (GNNGRU), and time-aware graph neural networks (TimeGNN) -- to effectively model nonlinear spatial-temporal patterns. The package also provides robust preprocessing tools, including outlier detection and a reinforcement learning-based gap-filling algorithm, the Kalman-TransFusion Interpolation Framework (KTIF). GTS Forecaster currently supports forecasting, visualization, and evaluation of GNSS, SSH, and TG datasets, and is adaptable to general time series applications. By combining cutting-edge models with an accessible interface, it facilitates the application of deep learning in geodetic forecasting tasks.


Crystal Systems Classification of Phosphate-Based Cathode Materials Using Machine Learning for Lithium-Ion Battery

arXiv.org Artificial Intelligence

The physical and chemical characteristics of cathodes used in batteries are derived from the lithium-ion phosphate cathodes crystalline arrangement, which is pivotal to the overall battery performance. Therefore, the correct prediction of the crystal system is essential to estimate the properties of cathodes. This study applies machine learning classification algorithms for predicting the crystal systems, namely monoclinic, orthorhombic, and triclinic, related to Li P (Mn, Fe, Co, Ni, V) O based Phosphate cathodes. The data used in this work is extracted from the Materials Project. Feature evaluation showed that cathode properties depend on the crystal structure, and optimized classification strategies lead to better predictability. Ensemble machine learning algorithms such as Random Forest, Extremely Randomized Trees, and Gradient Boosting Machines have demonstrated the best predictive capabilities for crystal systems in the Monte Carlo cross-validation test. Additionally, sequential forward selection (SFS) is performed to identify the most critical features influencing the prediction accuracy for different machine learning models, with Volume, Band gap, and Sites as input features ensemble machine learning algorithms such as Random Forest (80.69%), Extremely Randomized Tree (78.96%), and Gradient Boosting Machine (80.40%) approaches lead to the maximum accuracy towards crystallographic classification with stability and the predicted materials can be the potential cathode materials for lithium ion batteries.


An Internet of Intelligent Things Framework for Decentralized Heterogeneous Platforms

arXiv.org Artificial Intelligence

--Internet of Intelligent Things (IoIT), an emerging field, combines the utility of Internet of Things (IoT) devices with the innovation of embedded AI algorithms. However, it does not come without challenges, and struggles regarding available computing resources, energy supply, and storage limitations. In particular, many impediments to IoIT are linked to the energy-efficient deployment of machine learning (ML)/deep learning (DL) models in embedded devices. Research has been conducted to design energy-efficient IoIT platforms, but these papers often focus on centralized systems, in which some central entity processes all the data and coordinates actions. This can be problematic, e.g., serve as bottleneck or lead to security concerns. In a decentralized system, nodes/devices would self-organize and make their own decisions. Therefore, to address such issues, we propose a heterogeneous, decentralized sensing and monitoring IoIT peer-to-peer mesh network system model. Nodes in the network will coordinate towards several optimization goals: reliability, energy efficiency, and latency. The system employs federated learning to train nodes in a distributed manner, metaheuristics to optimize task allocation and routing paths, and multi-objective optimization to balance conflicting performance goals. Internet of Intelligent Things (IoIT), an emerging field, combines the utility of Internet of Things (IoT) devices with the innovation of embedded AI algorithms. It provides predictive and faster data analytics in IoT platforms thanks to machine learning algorithms which enable intelligent processing of huge amounts of sensor-generated data. However, it does not come without challenges, and struggles regarding available computing resources, energy supply, and storage limitations. In particular, many impediments to IoIT are linked to the energy-efficient deployment of machine learning (ML)/deep learning (DL) models in embedded devices.


SOH-KLSTM: A Hybrid Kolmogorov-Arnold Network and LSTM Model for Enhanced Lithium-Ion Battery Health Monitoring

arXiv.org Artificial Intelligence

Lithium (Li) batteries have emerged as a dominant energy storage solution due to their exceptional energy density, prolonged cycle life, fast charging capability, and adaptability across diverse applications, including electric vehicles, renewable energy systems, and portable electronics [1, 2, 3]. However, their performance inevitably degrades with time driven by repeated charge and discharge cycles, temperature fluctuations, and ageing effects [4, 5]. This degradation not only reduces battery efficiency and reliability but also poses significant safety risks, particularly in high-demand applications where performance consistency is critical [6], [7]. As a result, accurate estimation of the State of Health (SOH) is essential to ensure the longevity and safe operation of Li batteries. SOH is a key indicator of the remaining capacity and functional integrity of a battery relative to its initial state. It encompasses key variables such as voltage, current, temperature, and other factors that influence battery performance.


PySensors 2.0: A Python Package for Sparse Sensor Placement

arXiv.org Artificial Intelligence

PySensors is a Python package for selecting and placing a sparse set of sensors for reconstruction and classification tasks. In this major update to PySensors, we introduce spatially constrained sensor placement capabilities, allowing users to enforce constraints such as maximum or exact sensor counts in specific regions, incorporate predetermined sensor locations, and maintain minimum distances between sensors. We extend functionality to support custom basis inputs, enabling integration of any data-driven or spectral basis. We also propose a thermodynamic approach that goes beyond a single "optimal" sensor configuration and maps the complete landscape of sensor interactions induced by the training data. This comprehensive view facilitates integration with external selection criteria and enables assessment of sensor replacement impacts. The new optimization technique also accounts for over- and under-sampling of sensors, utilizing a regularized least squares approach for robust reconstruction. Additionally, we incorporate noise-induced uncertainty quantification of the estimation error and provide visual uncertainty heat maps to guide deployment decisions. To highlight these additions, we provide a brief description of the mathematical algorithms and theory underlying these new capabilities. We demonstrate the usage of new features with illustrative code examples and include practical advice for implementation across various application domains. Finally, we outline a roadmap of potential extensions to further enhance the package's functionality and applicability to emerging sensing challenges.


VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving

arXiv.org Artificial Intelligence

Modern Large Language Model (LLM) serving systems increasingly support interactive applications, like real-time chat assistants, code generation tools, and agentic workflows. However, the soaring energy cost of LLM inference presents a growing challenge for sustainable and cost-effective deployment. This paper introduces VoltanaLLM, a system for SLO-aware, energy-efficient LLM serving, built from a control theory perspective. VoltanaLLM co-designs frequency scaling and request routing in emerging prefill/decode disaggregated architectures, leveraging their decoupled execution to enable fine-grained phase-specific control. It consists of a feedback-driven frequency controller that dynamically adapts GPU frequency for prefill and decode phases, and a state-space router that explores routing decisions across frequency-scaled instances to minimize energy under latency constraints. We implement VoltanaLLM in SGLang and evaluate its performance over multiple state-of-the-art LLMs and real-world datasets. The results demonstrate that VoltanaLLM achieves up to 36.3% energy savings while maintaining near-perfect SLO attainment rate, paving the way for sustainable and intelligent LLM serving. Code of VoltanaLLM is open-sourced on GitHub: https://github.com/Supercomputing-System-AI-Lab/VoltanaLLM.


Principled Approximation Methods for Efficient and Scalable Deep Learning

arXiv.org Artificial Intelligence

Recent progress in deep learning has been driven by increasingly larger models. However, their computational and energy demands have grown proportionally, creating significant barriers to their deployment and to a wider adoption of deep learning technologies. This thesis investigates principled approximation methods for improving the efficiency of deep learning systems, with a particular focus on settings that involve discrete constraints and non-differentiability. We study three main approaches toward improved efficiency: architecture design, model compression, and optimization. For model compression, we propose novel approximations for pruning and quantization that frame the underlying discrete problem as continuous and differentiable, enabling gradient-based training of compression schemes alongside the model's parameters. These approximations allow for fine-grained sparsity and precision configurations, leading to highly compact models without significant fine-tuning. In the context of architecture design, we design an algorithm for neural architecture search that leverages parameter sharing across layers to efficiently explore implicitly recurrent architectures. Finally, we study adaptive optimization, revisiting theoretical properties of widely used methods and proposing an adaptive optimizer that allows for quick hyperparameter tuning. Our contributions center on tackling computationally hard problems via scalable and principled approximations. Experimental results on image classification, language modeling, and generative modeling tasks show that the proposed methods provide significant improvements in terms of training and inference efficiency while maintaining, or even improving, the model's performance.


Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications

arXiv.org Artificial Intelligence

The integration of Large Language Models (LLMs) into safety-critical domains, such as nuclear engineering, necessitates a deep understanding of their internal reasoning processes. This paper presents a novel methodology for interpreting how an LLM encodes and utilizes domain-specific knowledge, using a Boiling Water Reactor system as a case study. We adapted a general-purpose LLM (Gemma-3-1b-it) to the nuclear domain using a parameter-efficient fine-tuning technique known as Low-Rank Adaptation. By comparing the neuron activation patterns of the base model to those of the fine-tuned model, we identified a sparse set of neurons whose behavior was significantly altered during the adaptation process. To probe the causal role of these specialized neurons, we employed a neuron silencing technique. Our results demonstrate that while silencing most of these specialized neurons individually did not produce a statistically significant effect, deactivating the entire group collectively led to a statistically significant degradation in task performance. Qualitative analysis further revealed that silencing these neurons impaired the model's ability to generate detailed, contextually accurate technical information. This paper provides a concrete methodology for enhancing the transparency of an opaque black-box model, allowing domain expertise to be traced to verifiable neural circuits. This offers a pathway towards achieving nuclear-grade artificial intelligence (AI) assurance, addressing the verification and validation challenges mandated by nuclear regulatory frameworks (e.g., 10 CFR 50 Appendix B), which have limited AI deployment in safety-critical nuclear operations.