AITopics | time-varying system

Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations

Neural Information Processing SystemsDec-26-2025, 12:22:38 GMT

We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent algorithm on the policy parameters while requiring less information and computation. When convexity holds, our algorithm is the first to achieve optimal policy regret. When convexity does not hold, we provide the first local regret bound for online policy selection. Our numerical experiments show that GAPS can adapt to changing environments more quickly than existing benchmarks.

adaptive policy selection, online adaptive policy selection, time-varying system, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Online Control of Unknown Time-Varying Dynamical Systems

Neural Information Processing SystemsDec-24-2025, 09:57:00 GMT

We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model. At a high level, we demonstrate that this setting is \emph{qualitatively harder} than that of either unknown time-invariant or known time-varying dynamics, and complement our negative results with algorithmic upper bounds in regimes where sublinear regret is possible. More specifically, we study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies. While these three classes are essentially equivalent for LTI systems, we demonstrate that these equivalences break down for time-varying systems. We prove a lower bound that no algorithm can obtain sublinear regret with respect to the first two classes unless a certain measure of system variability also scales sublinearly in the horizon. Furthermore, we show that offline planning over the state linear feedback policies is NP-hard, suggesting hardness of the online learning problem. On the positive side, we give an efficient algorithm that attains a sublinear regret bound against the class of Disturbance Response policies up to the aforementioned system variability term. In fact, our algorithm enjoys sublinear \emph{adaptive} regret bounds, which is a strictly stronger metric than standard regret and is more appropriate for time-varying systems. We sketch extensions to Disturbance Action policies and partial observation, and propose an inefficient algorithm for regret against linear state feedback policies.

name change, online control, unknown time-varying dynamical system, (8 more...)

Neural Information Processing Systems

Industry: Education (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Residual Variance Matching Recursive Least Squares Filter for Real-time UAV Terrain Following

Wu, Xiaobo, Zhang, Youmin

arXiv.org Machine LearningDec-8-2025

Accurate real-time waypoints estimation for the UAV-based online Terrain Following during wildfire patrol missions is critical to ensuring flight safety and enabling wildfire detection. However, existing real-time filtering algorithms struggle to maintain accurate waypoints under measurement noise in nonlinear and time-varying systems, posing risks of flight instability and missed wildfire detections during UAV-based terrain following. To address this issue, a Residual Variance Matching Recursive Least Squares (RVM-RLS) filter, guided by a Residual Variance Matching Estimation (RVME) criterion, is proposed to adaptively estimate the real-time waypoints of nonlinear, time-varying UAV-based terrain following systems. The proposed method is validated using a UAV-based online terrain following system within a simulated terrain environment. Experimental results show that the RVM-RLS filter improves waypoints estimation accuracy by approximately 88$\%$ compared with benchmark algorithms across multiple evaluation metrics. These findings demonstrate both the methodological advances in real-time filtering and the practical potential of the RVM-RLS filter for UAV-based online wildfire patrol.

estimation, rvm-rl filter, terrain, (17 more...)

arXiv.org Machine Learning

2512.05918

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > Canada > Quebec > Montreal (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.68)
Aerospace & Defense (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking

Saxena, Shaifalee, Williams, Alan, Fierro, Rafael, Scheinker, Alexander

arXiv.org Artificial IntelligenceOct-6-2025

In this paper, we study the use of robust model independent bounded extremum seeking (ES) feedback control to improve the robustness of deep reinforcement learning (DRL) controllers for a class of nonlinear time-varying systems. DRL has the potential to learn from large datasets to quickly control or optimize the outputs of many-parameter systems, but its performance degrades catastrophically when the system model changes rapidly over time. Bounded ES can handle time-varying systems with unknown control directions, but its convergence speed slows down as the number of tuned parameters increases and, like all local adaptive methods, it can get stuck in local minima. We demonstrate that together, DRL and bounded ES result in a hybrid controller whose performance exceeds the sum of its parts with DRL taking advantage of historical data to learn how to quickly control a many-parameter system to a desired setpoint while bounded ES ensures its robustness to time variations. We present a numerical study of a general time-varying system and a combined ES-DRL controller for automatic tuning of the Low Energy Beam Transport section at the Los Alamos Neutron Science Center linear particle accelerator.

controller, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2510.0249

Country: North America > United States > New Mexico > Los Alamos County > Los Alamos (0.25)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Energy (1.00)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Online Control of Unknown Time-Varying Dynamical Systems

Neural Information Processing SystemsAug-15-2025, 15:32:02 GMT

In fact, our algorithm enjoys sublinear adaptive regret bounds, which is a strictly stronger metric than standard regret and is more appropriate for time-varying systems.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations

Neural Information Processing SystemsJan-19-2025, 18:13:46 GMT

We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent algorithm on the policy parameters while requiring less information and computation. When convexity holds, our algorithm is the first to achieve optimal policy regret. When convexity does not hold, we provide the first local regret bound for online policy selection.

adaptive policy selection, contractive perturbation, online adaptive policy selection, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback

Online Control of Unknown Time-Varying Dynamical Systems

Neural Information Processing SystemsOct-11-2024, 15:22:13 GMT

We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model. At a high level, we demonstrate that this setting is \emph{qualitatively harder} than that of either unknown time-invariant or known time-varying dynamics, and complement our negative results with algorithmic upper bounds in regimes where sublinear regret is possible. More specifically, we study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies. While these three classes are essentially equivalent for LTI systems, we demonstrate that these equivalences break down for time-varying systems. We prove a lower bound that no algorithm can obtain sublinear regret with respect to the first two classes unless a certain measure of system variability also scales sublinearly in the horizon.

feedback policy, sublinear regret, unknown time-varying dynamical system, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Reinforcement Learning in Time-Varying Systems: an Empirical Study

Hamadanian, Pouya, Schwarzkopf, Malte, Sen, Siddartha, Alizadeh, Mohammad

arXiv.org Artificial IntelligenceJan-14-2022

Recent research has turned to Reinforcement Learning (RL) to solve challenging decision problems, as an alternative to hand-tuned heuristics. RL can learn good policies without the need for modeling the environment's dynamics. Despite this promise, RL remains an impractical solution for many real-world systems problems. A particularly challenging case occurs when the environment changes over time, i.e. it exhibits non-stationarity. In this work, we characterize the challenges introduced by non-stationarity and develop a framework for addressing them to train RL agents in live systems. Such agents must explore and learn new environments, without hurting the system's performance, and remember them over time. To this end, our framework (1) identifies different environments encountered by the live system, (2) explores and trains a separate expert policy for each environment, and (3) employs safeguards to protect the system's performance. We apply our framework to two systems problems: straggler mitigation and adaptive video streaming, and evaluate it against a variety of alternative approaches using real-world and synthetic data. We show that each component of our framework is necessary to cope with non-stationarity.

agent, reinforcement learning, workload, (12 more...)

arXiv.org Artificial Intelligence

2201.0556

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

Adaptive Machine Learning for Time-Varying Systems: Low Dimensional Latent Space Tuning

Scheinker, Alexander

arXiv.org Machine LearningJul-13-2021

Machine learning (ML) tools such as encoder-decoder convolutional neural networks (CNN) can represent incredibly complex nonlinear functions which map between combinations of images and scalars. For example, CNNs can be used to map combinations of accelerator parameters and images which are 2D projections of the 6D phase space distributions of charged particle beams as they are transported between various particle accelerator locations. Despite their strengths, applying ML to time-varying systems, or systems with shifting distributions, is an open problem, especially for large systems for which collecting new data for re-training is impractical or interrupts operations. Particle accelerators are one example of large time-varying systems for which collecting detailed training data requires lengthy dedicated beam measurements which may no longer be available during regular operations. We present a recently developed method of adaptive ML for time-varying systems. Our approach is to map very high (N>100k) dimensional inputs (a combination of scalar parameters and images) into the low dimensional (N~2) latent space at the output of the encoder section of an encoder-decoder CNN. We then actively tune the low dimensional latent space-based representation of complex system dynamics by the addition of an adaptively tuned feedback vector directly before the decoder sections builds back up to our image-based high-dimensional phase space density representations. This method allows us to learn correlations within and to quickly tune the characteristics of incredibly high parameter systems and to track their evolution in real time based on feedback without massive new data sets for re-training.

latent space, phase space, projection, (14 more...)

arXiv.org Machine Learning

2107.06207

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Portugal (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Filters

Collaborating Authors

time-varying system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations

Online Control of Unknown Time-Varying Dynamical Systems

A Residual Variance Matching Recursive Least Squares Filter for Real-time UAV Terrain Following

Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking

Online Control of Unknown Time-Varying Dynamical Systems

Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations

Online Control of Unknown Time-Varying Dynamical Systems

Reinforcement Learning in Time-Varying Systems: an Empirical Study

Adaptive Machine Learning for Time-Varying Systems: Low Dimensional Latent Space Tuning