AITopics | robust exploration

Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsDec-24-2025, 23:40:50 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

linear quadratic reinforcement, name change, robust exploration, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsMay-27-2025, 07:43:52 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Reviews: Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsJan-21-2025, 08:11:37 GMT

The paper is very well written and organized and its contributions are quite original as it proposes a novel coarse-ID method for robust model-based reinforcement learning in which both exploration AND exploitation are optimized jointly (which was not the case in previous similar works). The method proposed to solve the robust Reinforcement Learning problem is all the more original as it does not rely on Stochastic Dynamic Programming, but rather on Semidefinite Programming. Concerning clarity, the only element that is not clear for me is related to equation (1) in page 2: do you consider in the system model some uncertainty in the measurements of the states x? For example, it is said in the supplemental material that the velocity of the servo-motor of your second experiment is estimated using a high pass-filter, and is hence not perfectly known. If it is modeled, is it included in the process noise w or how do you deal with it?

complexity analysis, linear quadratic reinforcement, robust exploration, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

Reviews: Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsJan-21-2025, 08:11:26 GMT

The paper presents a new technique for robust optimization and balanced exploration in LQR problems. The technique is quite innovative since it leverages semidefinite programming instead of dynamic programming. This is an important algorithmic contribution with solid theory. For the empirical evaluation, the authors are expected to include the new experiments and running times mentioned in the rebuttal. Overall, this is very nice work.

linear quadratic reinforcement, robust exploration

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsOct-9-2024, 11:07:01 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

linear quadratic reinforcement, robust exploration, worst-case cost

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Umenberger, Jack, Ferizbegovic, Mina, Schön, Thomas B., Hjalmarsson, Håkan

Neural Information Processing SystemsMar-19-2020, 03:02:33 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

linear quadratic reinforcement, robust exploration, worst-case cost

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Robust Exploration with Tight Bayesian Plausibility Sets

Russel, Reazul H., Gu, Tianyi, Petrik, Marek

arXiv.org Artificial IntelligenceApr-17-2019

Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms. We propose optimism in the face of sensible value functions (OFVF)- a novel data-driven Bayesian algorithm to constructing Plausibility sets for MDPs to explore robustly minimizing the worst case exploration cost. The method computes policies with tighter optimistic estimates for exploration by introducing two new ideas. First, it is based on Bayesian posterior distributions rather than distribution-free bounds. Second, OFVF does not construct plausibility sets as simple confidence intervals. Confidence intervals as plausibility sets are a sufficient but not a necessary condition. OFVF uses the structure of the value function to optimize the location and shape of the plausibility set to guarantee upper bounds directly without necessarily enforcing the requirement for the set to be a confidence interval. OFVF proceeds in an episodic manner, where the duration of the episode is fixed and known. Our algorithm is inherently Bayesian and can leverage prior information. Our theoretical analysis shows the robustness of OFVF, and the empirical results demonstrate its practical promise.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1904.08528

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Technology: