hdm
Gradient Methods with Online Scaling Part II. Practical Aspects
Chu, Ya-Chi, Gao, Wenzhi, Ye, Yinyu, Udell, Madeleine
Part I of this work [Gao25] establishes online scaled gradient methods (OSGM), a framework that utilizes online convex optimization to adapt stepsizes in gradient methods. This paper focuses on the practical aspects of OSGM. We leverage the OSGM framework to design new adaptive first-order methods and provide insights into their empirical behavior. The resulting method, OSGM-Best, matches the performance of quasi-Newton variants while requiring less memory and cheaper iterations. We also extend OSGM to nonconvex optimization and outline directions that connect OSGM to existing branches of optimization theory and practice.
Heat Diffusion Models -- Interpixel Attention Mechanism
Since adjacent pixels are highly likely to belong to the same object, we propose the Heat Diffusion Model (HDM) to further preserve image details and generate more realistic images. HDM essentially is a DDPM that incorporates an attention mechanism between pixels. In HDM, the discrete form of the two - dimensional heat equation is integrated into the diffusion and generation formulas of DDPM, enabling the model to compute relationships between neighboring pixels during image processing. Our experiments demonstrate that HDM can generate higher - quality samples compared to models such as DDPM, Consistency Diffusion Models (CDM), Latent Diffusion Models (LDM), and V ector Quantized Generativ e Adversarial Networks (VQGAN).
Adapting A Vector-Symbolic Memory for Lisp ACT-R
Ray, Meera, Dancy, Christopher L.
Holographic Declarative Memory (HDM) is a vector-symbolic alternative to ACT-R's Declarative Memory (DM) system that can bring advantages such as scalability and architecturally defined similarity between DM chunks. We adapted HDM to work with the most comprehensive and widely-used implementation of ACT-R (Lisp ACT-R) so extant ACT-R models designed with DM can be run with HDM without major changes. With this adaptation of HDM, we have developed vector-based versions of common ACT-R functions, set up a text processing pipeline to add the contents of large documents to ACT-R memory, and most significantly created a useful and novel mechanism to retrieve an entire chunk of memory based on a request using only vector representations of tokens. Preliminary results indicate that we can maintain vector-symbolic advantages of HDM (e.g., chunk recall without storing the actual chunk and other advantages with scaling) while also extending it so that previous ACT-R models may work with the system with little (or potentially no) modifications within the actual procedural and declarative memory portions of a model. As a part of iterative improvement of this newly translated holographic declarative memory module, we will continue to explore better time-context representations for vectors to improve the module's ability to reconstruct chunks during recall. To more fully test this translated HDM module, we also plan to develop decision-making models that use instance-based learning (IBL) theory, which is a useful application of HDM given the advantages of the system.
Psychology-Driven Enhancement of Humour Translation
Su, Yuchen, Zhu, Yonghua, Chen, Yang, Benavides-Prado, Diana, Witbrock, Michael
Humour translation plays a vital role as a bridge between different cultures, fostering understanding and communication. Although most existing Large Language Models (LLMs) are capable of general translation tasks, these models still struggle with humour translation, which is especially reflected through linguistic interference and lacking humour in translated text. In this paper, we propose a psychology-inspired Humour Decomposition Mechanism (HDM) that utilises Chain-of-Thought (CoT) to imitate the ability of the human thought process, stimulating LLMs to optimise the readability of translated humorous texts. Moreover, we integrate humour theory in HDM to further enhance the humorous elements in the translated text. Our automatic evaluation experiments on open-source humour datasets demonstrate that our method significantly improves the quality of humour translation, yielding average gains of 7.75\% in humour, 2.81\% in fluency, and 6.13\% in coherence of the generated text.
Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent
Chu, Ya-Chi, Gao, Wenzhi, Ye, Yinyu, Udell, Madeleine
This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with empirical and theoretical support. Notably, HDM automatically identifies the optimal stepsize for the local optimization landscape and achieves local superlinear convergence. Our analysis explains the instability of HDM reported in the literature and proposes efficient strategies to address it. We also develop two HDM variants with heavy-ball and Nesterov momentum. Experiments on deterministic convex problems show HDM with heavy-ball momentum (HDM-HB) exhibits robust performance and significantly outperforms other adaptive first-order methods. Moreover, HDM-HB often matches the performance of L-BFGS, an efficient and practical quasi-Newton method, using less memory and cheaper iterations.
A Q-learning Approach for Adherence-Aware Recommendations
Faros, Ioannis, Dave, Aditya, Malikopoulos, Andreas A.
In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. In this letter, we develop an "adherence-aware Q-learning" algorithm to address this problem. The algorithm learns the "adherence level" that captures the frequency with which an HDM follows the recommended actions and derives the best recommendation policy in real time. We prove the convergence of the proposed Q-learning algorithm to the optimal value and evaluate its performance across various scenarios.
Hypoelliptic Diffusion Maps I: Tangent Bundles
We introduce the concept of Hypoelliptic Diffusion Maps (HDM), a framework generalizing Diffusion Maps in the context of manifold learning and dimensionality reduction. Standard non-linear dimensionality reduction methods (e.g., LLE, ISOMAP, Laplacian Eigenmaps, Diffusion Maps) focus on mining massive data sets using weighted affinity graphs; Orientable Diffusion Maps and Vector Diffusion Maps enrich these graphs by attaching to each node also some local geometry. HDM likewise considers a scenario where each node possesses additional structure, which is now itself of interest to investigate. Virtually, HDM augments the original data set with attached structures, and provides tools for studying and organizing the augmented ensemble. The goal is to obtain information on individual structures attached to the nodes and on the relationship between structures attached to nearby nodes, so as to study the underlying manifold from which the nodes are sampled. In this paper, we analyze HDM on tangent bundles, revealing its intimate connection with sub-Riemannian geometry and a family of hypoelliptic differential operators. In a later paper, we shall consider more general fibre bundles.