Goto

Collaborating Authors

 ano


Pay Attention to Small Weights

arXiv.org Artificial Intelligence

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.


Quantum Reinforcement Learning by Adaptive Non-local Observables

arXiv.org Artificial Intelligence

Hybrid quantum-classical frameworks leverage quantum computing for machine learning; however, variational quantum circuits (VQCs) are limited by the need for local measurements. We introduce an adaptive non-local observable (ANO) paradigm within VQCs for quantum reinforcement learning (QRL), jointly optimizing circuit parameters and multi-qubit measurements. The ANO-VQC architecture serves as the function approximator in Deep Q-Network (DQN) and Asynchronous Advantage Actor-Critic (A3C) algorithms. On multiple benchmark tasks, ANO-VQC agents outperform baseline VQCs. Ablation studies reveal that adaptive measurements enhance the function space without increasing circuit depth. Our results demonstrate that adaptive multi-qubit observables can enable practical quantum advantages in reinforcement learning.


Adaptive Non-local Observable on Quantum Neural Networks

arXiv.org Artificial Intelligence

--Conventional V ariational Quantum Circuits (VQCs) for Quantum Machine Learning typically rely on a fixed Her-mitian observable, often built from Pauli operators. Inspired by the Heisenberg picture, we propose an adaptive non-local measurement framework that substantially increases the model complexity of the quantum circuits. Our introduction of dynamical Hermitian observables with evolving parameters shows that optimizing VQC rotations corresponds to tracing a trajectory in the observable space. This viewpoint reveals that standard VQCs are merely a special case of the Heisenberg representation. Furthermore, we show that properly incorporating variational rotations with non-local observables enhances qubit interaction and information mixture, admitting flexible circuit designs. Two non-local measurement schemes are introduced, and numerical simulations on classification tasks confirm that our approach outperforms conventional VQCs, yielding a more powerful and resource-efficient approach as a Quantum Neural Network. Quantum Machine Learning (QML) is a developing field that leverages the principles of quantum mechanics to advance machine learning (ML) models. With the rapid advancement of quantum computing hardware, QML aims to exploit quantum phenomena--such as superposition, entanglement, and quantum interference--to provide computational advantages over classical approaches. Despite the current limitations of quantum hardware, hybrid quantum-classical algorithms have been developed to harness the strengths of both computing paradigms, allowing near-term quantum devices to contribute meaningfully to real-world ML tasks.


Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

arXiv.org Artificial Intelligence

Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing Nano, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. Nano achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that Nano is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals' personal preferences with high sample efficiency.


Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch

arXiv.org Artificial Intelligence

We propose Rockmate to control the memory requirements when training PyTorch DNN models. Rockmate is an automatic tool that starts from the model code and generates an equivalent model, using a predefined amount of memory for activations, at the cost of a few re-computations. Rockmate automatically detects the structure of computational and data dependencies and rewrites the initial model as a sequence of complex blocks. We show that such a structure is widespread and can be found in many models in the literature (Transformer based models, ResNet, RegNets,...). This structure allows us to solve the problem in a fast and efficient way, using an adaptation of Checkmate (too slow on the whole model but general) at the level of individual blocks and an adaptation of Rotor (fast but limited to sequential models) at the level of the sequence itself. We show through experiments on many models that Rockmate is as fast as Rotor and as efficient as Checkmate, and that it allows in many cases to obtain a significantly lower memory consumption for activations (by a factor of 2 to 5) for a rather negligible overhead (of the order of 10% to 20%). Rockmate is open source and available at https://github.com/topal-team/rockmate.


Root-cause Analysis for Time-series Anomalies via Spatiotemporal Graphical Modeling in Distributed Complex Systems

arXiv.org Machine Learning

Performance monitoring, anomaly detection, and root-cause analysis in complex cyber-physical systems (CPSs) are often highly intractable due to widely diverse operational modes, disparate data types, and complex fault propagation mechanisms. This paper presents a new data-driven framework for root-cause analysis, based on a spatiotemporal graphical modeling approach built on the concept of symbolic dynamics for discovering and representing causal interactions among sub-systems of complex CPSs. We formulate the root-cause analysis problem as a minimization problem via the proposed inference based metric and present two approximate approaches for root-cause analysis, namely the sequential state switching ($S^3$, based on free energy concept of a restricted Boltzmann machine, RBM) and artificial anomaly association ($A^3$, a classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and anomalous node(s) are simulated to validate the proposed approaches. Real dataset based on Tennessee Eastman process (TEP) is also used for comparison with other approaches. The results show that: (1) $S^3$ and $A^3$ approaches can obtain high accuracy in root-cause analysis under both pattern-based and node-based fault scenarios, in addition to successfully handling multiple nominal operating modes, (2) the proposed tool-chain is shown to be scalable while maintaining high accuracy, and (3) the proposed framework is robust and adaptive in different fault conditions and performs better in comparison with the state-of-the-art methods.