AITopics | structured reinforcement learning

Collaborating Authors

structured reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploration in Structured Reinforcement Learning

Neural Information Processing SystemsNov-20-2025, 23:03:18 GMT

We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs. For any arbitrary structure, we derive problem-specific regret lower bounds satisfied by any learning algorithm. These lower bounds are made explicit for unstructured MDPs and for those whose transition probabilities and average reward functions are Lipschitz continuous w.r.t. the state and action. For Lipschitz MDPs, the bounds are shown not to scale with the sizes S and A of the state and action spaces, i.e., they are smaller than c log T where T is the time horizon and the constant c only depends on the Lipschitz structure, the span of the bias function, and the minimal action sub-optimality gap. This contrasts with unstructured MDPs where the regret lower bound typically scales as SA log T. We devise DEL (Directed Exploration Learning), an algorithm that matches our regret lower bounds. We further simplify the algorithm for Lipschitz MDPs, and show that the simplified version is still able to efficiently exploit the structure.

exploration, name change, structured reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Structured Reinforcement Learning for Combinatorial Decision-Making

Hoppe, Heiko, Baty, Léo, Bouvier, Louis, Parmentier, Axel, Schiffer, Maximilian

arXiv.org Machine LearningMay-27-2025

Reinforcement learning (RL) is increasingly applied to real-world problems involving complex and structured decisions, such as routing, scheduling, and assortment planning. These settings challenge standard RL algorithms, which struggle to scale, generalize, and exploit structure in the presence of combinatorial action spaces. We propose Structured Reinforcement Learning (SRL), a novel actor-critic framework that embeds combinatorial optimization layers into the actor neural network. We enable end-to-end learning of the actor via Fenchel-Young losses and provide a geometric interpretation of SRL as a primal-dual algorithm in the dual of the moment polytope. Across six environments with exogenous and endogenous uncertainty, SRL matches or surpasses the performance of unstructured RL and imitation learning on static tasks and improves over these baselines by up to 92% on dynamic problems, with improved stability and convergence speed.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2505.19053

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Exploration in Structured Reinforcement Learning

Neural Information Processing SystemsOct-8-2024, 05:42:09 GMT

It provides problem-related (asymptotic) lower and upper bounds on the regret, the latter for an algorithm presented in the paper that builds on Burnetas and Katehakis (1997) and a recent bandit paper by Combes et al (NIPS 2017). The setting assumes that an "MDP structure" \Phi (i.e. a set of possible MDP models) is given. The regret bounds (after T steps) are shown to be of the form K_Phi*log T, where the parameter K_\Phi is the solution to a particular optimization problem. It is shown that if \Phi is the set of all MDPs ("the unstructured case") then K_\Phi is bounded by HSA/\delta, where H is the bias span and \delta the minimal action sub-optimality gap. The second particular class that is considered is the Lipschitz structure that considers embeddings of finite MDPs in Euclidian space such that transition probabilities and rewards are Lipschitz. In this case, the regret bounds are shown to not to depend on the size of state and action space anymore.

algorithm, exploration, structured reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.39)

Add feedback

Structured Reinforcement Learning for Media Streaming at the Wireless Edge

Bura, Archana, Bobbili, Sarat Chandra, Rameshkumar, Shreyas, Rengarajan, Desik, Kalathil, Dileep, Shakkottai, Srinivas

arXiv.org Artificial IntelligenceApr-16-2024

Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to dynamically prioritize in a video streaming setting. We formulate the policy design question as a constrained Markov decision problem (CMDP), and observe that by using a Lagrangian relaxation we can decompose it into single-client problems. Further, the optimal policy takes a threshold form in the video buffer length, which enables us to design an efficient constrained reinforcement learning (CRL) algorithm to learn it. Specifically, we show that a natural policy gradient (NPG) based algorithm that is derived using the structure of our problem converges to the globally optimal policy. We then develop a simulation environment for training, and a real-world intelligent controller attached to a WiFi access point for evaluation. We empirically show that the structured learning approach enables fast learning. Furthermore, such a structured policy can be easily deployed due to low computational complexity, leading to policy execution taking only about 15$\mu$s. Using YouTube streaming experiments in a resource constrained scenario, we demonstrate that the CRL approach can increase quality of experience (QOE) by over 30\%.

algorithm, optimal policy, value function, (13 more...)

arXiv.org Artificial Intelligence

2404.07315

Country: