AITopics | worst-case cost

Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsDec-24-2025, 23:40:50 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

linear quadratic reinforcement, name change, robust exploration, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Foremost, we would like to thank the reviewers and (S)ACs for giving up their time to conduct and organize the

Neural Information Processing SystemsOct-2-2025, 00:12:42 GMT

Results are presented in Fig a. Full details will be provided Please allow us to first justify the use of the HIL experiment. All of the following points will be clarified in the revised manuscript (V2). 'gridding' continuous state/action spaces in order to apply DP-based methods, citing relevant literature. This is an interesting question. This is why the cost of greedy and RRL differ at the first epoch.

artificial intelligence, conduct and organize, experiment, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson

Neural Information Processing SystemsSep-26-2025, 07:32:29 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.14)
North America > Canada (0.14)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson

Neural Information Processing SystemsAug-17-2025, 14:27:51 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsMay-27-2025, 07:43:52 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Neural Information Processing SystemsOct-9-2024, 11:07:01 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

linear quadratic reinforcement, robust exploration, worst-case cost

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Worst-Case Adaptive Submodular Cover

Yuan, Jing, Tang, Shaojie

arXiv.org Artificial IntelligenceFeb-10-2023

In this paper, we study the adaptive submodular cover problem under the worst-case setting. This problem generalizes many previously studied problems, namely, the pool-based active learning and the stochastic submodular set cover. The input of our problem is a set of items (e.g., medical tests) and each item has a random state (e.g., the outcome of a medical test), whose realization is initially unknown. One must select an item at a fixed cost in order to observe its realization. There is an utility function which maps a subset of items and their states to a non-negative real number. We aim to sequentially select a group of items to achieve a ``target value'' while minimizing the maximum cost across realizations (a.k.a. worst-case cost). To facilitate our study, we assume that the utility function is \emph{worst-case submodular}, a property that is commonly found in many machine learning applications. With this assumption, we develop a tight $(\log (Q/\eta)+1)$-approximation policy, where $Q$ is the ``target value'' and $\eta$ is the smallest difference between $Q$ and any achievable utility value $\hat{Q}

artificial intelligence, machine learning, realization, (14 more...)

arXiv.org Artificial Intelligence

2210.13694

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Umenberger, Jack, Ferizbegovic, Mina, Schön, Thomas B., Hjalmarsson, Håkan

Neural Information Processing SystemsMar-19-2020, 03:02:33 GMT

Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task'robustly', i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

linear quadratic reinforcement, robust exploration, worst-case cost

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Robust exploration in linear quadratic reinforcement learning

Umenberger, Jack, Ferizbegovic, Mina, Schön, Thomas B., Hjalmarsson, Håkan

arXiv.org Machine LearningJun-4-2019

This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.

artificial intelligence, exploration, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1906.01584

Country: Europe > Sweden (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Efficient Online Learning for Optimizing Value of Information: Theory and Application to Interactive Troubleshooting

Chen, Yuxin, Renders, Jean-Michel, Chehreghani, Morteza Haghir, Krause, Andreas

arXiv.org Artificial IntelligenceJul-17-2017

We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms are either heuristics with no guarantees, or scale poorly (with exponential run time in terms of the number of available tests). Moreover, these methods assume a known distribution over the test outcomes, which is often not the case in practice. We propose an efficient sampling-based online learning framework to address the above issues. First, assuming the distribution over hypotheses is known, we propose a dynamic hypothesis enumeration strategy, which allows efficient information gathering with strong theoretical guarantees. We show that with sufficient amount of samples, one can identify a near-optimal decision with high probability. Second, when the parameters of the hypotheses distribution are unknown, we propose an algorithm which learns the parameters progressively via posterior sampling in an online fashion. We further establish a rigorous bound on the expected regret. We demonstrate the effectiveness of our approach on a real-world interactive troubleshooting application and show that one can efficiently make high-quality decisions with low cost.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1703.05452

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Education > Educational Setting > Online (0.61)

Technology: