Tuscany
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control
Skifstad, Julian, Yang, Xinyue Annie, Chou, Glen
Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure of transformer blocks, layer-wise dynamics across multiple LLM architectures and scales are well-approximated by locally-linear models. Exploiting this property, we model LLM inference as a linear time-varying dynamical system and adapt the classical linear quadratic regulator to compute feedback controllers using layer-wise Jacobians, steering activations toward desired semantic setpoints in closed-loop with minimal computational overhead and no offline training. We also derive theoretical bounds on setpoint tracking error, enabling formal guarantees on steering performance. Using a novel adaptive semantic feature setpoint signal, our method yields robust, fine-grained behavior control across models, scales, and tasks, including state-of-the-art modulation of toxicity, truthfulness, refusal, and arbitrary concepts, surpassing baseline steering methods. Our code is available at: https://github.com/trustworthyrobotics/lqr-activation-steering
- Oceania > Australia (0.04)
- Europe > United Kingdom > England (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (5 more...)
Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version
Leão, Dorival, Ohashi, Alberto, Scotti, Simone, da Silva, Adolfo M. D
This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.
- South America > Brazil > Federal District (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- North America > United States > California (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (7 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Information Technology (0.67)
- Education (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Europe > Slovenia (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- (29 more...)
- Europe > United Kingdom > England > Greater London > London (0.05)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Germany (0.14)
- Asia > China (0.14)
- North America > Canada > British Columbia (0.04)
- (12 more...)
- Law > Statutes (1.00)
- Law > Litigation (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- (5 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
- Europe > Russia (0.04)
- (6 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (13 more...)