Goto

Collaborating Authors

 Energy


Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly used across research and industry applications, yet their inference efficiency remains a significant challenge. As the computational power of modern GPU architectures continuously improves, their memory bandwidth and capacity have not scaled proportionally, creating a critical bottleneck during inference. To address this, we investigate ternary language models (TriLMs) that employ quantization-aware training to significantly reduce memory requirements. We first analyze the scalability of TriLMs by conducting a scaling law analysis, revealing that TriLMs benefit more from increasing training data than from scaling model parameters. Based on this observation, we introduce Spectra-1.1, an open suite of TriLMs trained on up to 1.2 trillion tokens, demonstrating sustained performance gains at scale. Furthermore, to improve inference efficiency, we propose novel 2-bit and 1.6-bit packing schemes for ternary weights, which demonstrate accelerated inference across various CPU architectures. Also, building on the 2-bit packing, we develop a GPU kernel called TriRun that accelerates end-to-end model inference by up to 5 times compared to floating-point baselines. To encourage further exploration and development of TriLMs, we will release the Spectra-1.1 suite and TriRun inference kernels. Overall, our work lays the foundation for building and deploying efficient LLMs, providing a valuable resource for the research community.


Momentum-based Accelerated Algorithm for Distributed Optimization under Sector-Bound Nonlinearity

arXiv.org Artificial Intelligence

Distributed optimization advances centralized machine learning methods by enabling parallel and decentralized learning processes over a network of computing nodes. This work provides an accelerated consensus-based distributed algorithm for locally non-convex optimization using the gradient-tracking technique. The proposed algorithm (i) improves the convergence rate by adding momentum towards the optimal state using the heavy-ball method, while (ii) addressing general sector-bound nonlinearities over the information-sharing network. The link nonlinearity includes any sign-preserving odd sector-bound mapping, for example, log-scale data quantization or clipping in practical applications. For admissible momentum and gradient-tracking parameters, using perturbation theory and eigen-spectrum analysis, we prove convergence even in the presence of sector-bound nonlinearity and for locally non-convex cost functions. Further, in contrast to most existing weight-stochastic algorithms, we adopt weight-balanced (WB) network design. This WB design and perturbation-based analysis allow to handle dynamic directed network of agents to address possible time-varying setups due to link failures or packet drops.


Quantum Neural Networks for Wind Energy Forecasting: A Comparative Study of Performance and Scalability with Classical Models

arXiv.org Artificial Intelligence

Quantum Neural Networks (QNNs), a prominent approach in Quantum Machine Learning (QML), are emerging as a powerful alternative to classical machine learning methods. Recent studies have focused on the applicability of QNNs to various tasks, such as time-series forecasting, prediction, and classification, across a wide range of applications, including cybersecurity and medical imaging. With the increased use of smart grids driven by the integration of renewable energy systems, machine learning plays an important role in predicting power demand and detecting system disturbances. This study provides an in-depth investigation of QNNs for predicting the power output of a wind turbine. We assess the predictive performance and simulation time of six QNN configurations that are based on the Z Feature Map for data encoding and varying ansatz structures. Through detailed cross-validation experiments and tests on an unseen hold-out dataset, we experimentally demonstrate that QNNs can achieve predictive performance that is competitive with, and in some cases marginally better than, the benchmarked classical approaches. Our results also reveal the effects of dataset size and circuit complexity on predictive performance and simulation time. We believe our findings will offer valuable insights for researchers in the energy domain who wish to incorporate quantum machine learning into their work.


Stabilization of industrial processes with time series machine learning

arXiv.org Artificial Intelligence

An application of machine learning to the industrial processes stabilization is an open problem which promises a huge potential benefit to the such critical industries as metals and energy development if solved. Classical optimization methods, such as finite-horizon markov decision processes [1], non-linear programming reformulation of control [2] and point-wise optimization [3] are frequently employed in order to achieve better stability of time series process, successfully improving production quality, minimizing expenses and manufacturing devices deficiency with near-future planing or real-time optimization. Machine learning, known for its prominent results in solution of enterprise problems [4], became widely applied to the time series prediction and generation after recent advances in such fields as natural language processing, due to the similarity aforementioned tasks in their time dependent recurrent nature [5]. Thus, contemporary time series modeling is performed with long short-term memory (LSTM) models [6] and Transformers [7], incorporating different attention strategies. Currently, state-of-the-art approaches to ML-driven optimization include an application of reinforcement learning, but for time series problems, the usual focus stays on approximation of the industrial process as a dynamic system on the basis of recurrent neural network (RNN), with such methods as recurrent stabilization control [8, 9].


Leveraging In-Context Learning for Political Bias Testing of LLMs

arXiv.org Artificial Intelligence

A growing body of work has been querying LLMs with political questions to evaluate their potential biases. However, this probing method has limited stability, making comparisons between models unreliable. In this paper, we argue that LLMs need more context. We propose a new probing task, Questionnaire Modeling (QM), that uses human survey data as in-context examples. We show that QM improves the stability of question-based bias evaluation, and demonstrate that it may be used to compare instruction-tuned models to their base versions. Experiments with LLMs of various sizes indicate that instruction tuning can indeed change the direction of bias. Furthermore, we observe a trend that larger models are able to leverage in-context examples more effectively, and generally exhibit smaller bias scores in QM. Data and code are publicly available.


Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research

arXiv.org Artificial Intelligence

The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to automating Monte Carlo simulations, predicting material degradation, and designing experimental programs for advanced reactors. Teams employed structured workflows combining prompt engineering, deep research capabilities, and iterative refinement to generate hypotheses, prototype code, and research strategies. Key findings demonstrate that LLMs excel at early-stage exploration, literature synthesis, and workflow design, successfully identifying research gaps and generating plausible experimental frameworks. However, significant limitations emerged, including difficulties with novel materials designs, advanced code generation for modeling and simulation, and domain-specific details requiring expert validation. The successful outcomes resulted from expert-driven prompt engineering and treating AI as a complementary tool rather than a replacement for physics-based methods. The workshop validated AI's potential to accelerate nuclear energy research through rapid iteration and cross-disciplinary synthesis while highlighting the need for curated nuclear-specific datasets, workflow automation, and specialized model development. These results provide a roadmap for integrating AI tools into nuclear science workflows, potentially reducing development cycles for safer, more efficient nuclear energy systems while maintaining rigorous scientific standards.


QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization

arXiv.org Artificial Intelligence

Inference accounts for the majority of latency and energy consumption in large language model (LLM) deployments, often exceeding 90% of total cost. While training-time efficiency has seen extensive progress, runtime optimization remains a key bottleneck, particularly under autoregressive decoding. Existing approaches -- such as pruning, quantization, early exits, and speculative decoding -- often require retraining, architectural changes, or disrupt decoding compatibility. We introduce QuickSilver, a modular, token-level framework that enables semantic adaptivity at inference time without altering model weights or structure. QuickSilver integrates four synergistic mechanisms: (i) Dynamic Token Halting, which halts computation for tokens with converged representations; (ii) KV Cache Skipping, which selectively suppresses memory writes to reduce attention overhead; and (iii) Contextual Token Fusion, which collapses redundant tokens into shared paths to shrink sequence length. Unlike speculative decoding or MoE routing, QuickSilver operates entirely on frozen, dense models and requires no auxiliary networks. Applied to GPT-2 and Llama-2 across WikiText-103 and C4, QuickSilver achieves up to 39.6% FLOP reduction with negligible perplexity degradation (<=0.2).


Thompson Sampling-Based Learning and Control for Unknown Dynamic Systems

arXiv.org Artificial Intelligence

Thompson sampling (TS) is an effective method to explore parametric uncertainties and can therefore be used for active learning-based controller design. However, TS relies on finite parametric representations, which limits its applicability to more general spaces, which are more commonly encountered in control system design. To address this issue, this work pro poses a parameterization method for control law learning using reproducing kernel Hilbert spaces and designs a data-driven active learning control approach. Specifically, the proposed method treats the control law as an element in a function space, allowing the design of control laws without imposing restrictions on the system structure or the form of the controller. A TS framework is proposed in this work to explore potential optimal control laws, and the convergence guarantees are further provided for the learning process. Theoretical analysis shows that the proposed method learns the relationship between control laws and closed-loop performance metrics at an exponential rate, and the upper bound of control regret is also derived. Numerical experiments on controlling unknown nonlinear systems validate the effectiveness of the proposed method.


TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments

arXiv.org Artificial Intelligence

--The evolution toward 6G networks demands a fundamental shift from bit-centric transmission to semantic-aware communication that emphasizes task-relevant information. This work introduces TOAST (T ask-Oriented Adaptive Semantic Transmission), a unified framework designed to address the core challenge of multi-task optimization in dynamic wireless environments through three complementary components. First, we formulate adaptive task balancing as a Markov decision process, employing deep reinforcement learning to dynamically adjust the trade-off between image reconstruction fidelity and semantic classification accuracy based on real-time channel conditions. Second, we integrate module-specific Low-Rank Adaptation (LoRA) mechanisms throughout our Swin Transformer-based joint source-channel coding architecture, enabling parameter-efficient fine-tuning that dramatically reduces adaptation overhead while maintaining full performance across diverse channel impairments including Additive White Gaussian Noise (A WGN), fading, phase noise, and impulse interference. Third, we incorporate an Elucidating diffusion model that operates in the latent space to restore features corrupted by channel noises, providing substantial quality improvements compared to baseline approaches. Extensive experiments across multiple datasets demonstrate that TOAST achieves superior performance compared to baseline approaches, with significant improvements in both classification accuracy and reconstruction quality at low Signal-to-Noise Ratio (SNR) conditions while maintaining robust performance across all tested scenarios. By seamlessly orchestrating reinforcement learning, diffusion-based enhancement, and parameter-efficient adaptation within a single coherent framework, TOAST represents a significant advancement toward adaptive semantic communication systems capable of thriving in the rigorous conditions of next-generation wireless networks. HE emergence of sixth-generation (6G) wireless networks marks a fundamental change in how communication is understood, shifting from Shannon's classical model of reliable bit transmission to a semantic-oriented approach that focuses on meaning and task relevance [1]. This development in Semantic Communication (SemCom) acknowledges that, in many practical scenarios, reconstructing every bit perfectly is neither required nor efficient. Instead, the key is to retain the information necessary for completing specific tasks, such as interpreting a scene, making a decision, or initiating an action [2]. Wang are with the Department of Electrical Engineering and Computer Science, Lassonde School of Engineering, Y ork University, Toronto, ON, Canada (e-mails: ys97@yorku.ca; J. Pei is with the School of Electrical and Electronic Engineering, Huazhong University of Science and Technology, Wuhan, China (e-mail: jianhuapei@hust.edu.cn).


Storm Surge in Color: RGB-Encoded Physics-Aware Deep Learning for Storm Surge Forecasting

arXiv.org Artificial Intelligence

Storm surge forecasting plays a crucial role in coastal disaster preparedness, yet existing machine learning approaches often suffer from limited spatial resolution, reliance on coastal station data, and poor generalization. Moreover, many prior models operate directly on unstructured spatial data, making them incompatible with modern deep learning architectures. In this work, we introduce a novel approach that projects unstructured water elevation fields onto structured Red Green Blue (RGB)-encoded image representations, enabling the application of Convolutional Long Short Term Memory (ConvLSTM) networks for end-to-end spatiotemporal surge forecasting. Our model further integrates ground-truth wind fields as dynamic conditioning signals and topo-bathymetry as a static input, capturing physically meaningful drivers of surge evolution. Evaluated on a large-scale dataset of synthetic storms in the Gulf of Mexico, our method demonstrates robust 48-hour forecasting performance across multiple regions along the Texas coast and exhibits strong spatial extensibility to other coastal areas. By combining structured representation, physically grounded forcings, and scalable deep learning, this study advances the frontier of storm surge forecasting in usability, adaptability, and interpretability.