AITopics | scale markov decision process

Collaborating Authors

scale markov decision process

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Scale Markov Decision Processes with Changing Rewards

Neural Information Processing SystemsDec-25-2025, 10:52:32 GMT

We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves a regret bound of $O( \sqrt{\tau (\ln|S|+\ln|A|)T}\ln(T))$, where $S$ is the state space, $A$ is the action space, $\tau$ is the mixing time of the MDP, and $T$ is the number of periods. The algorithm's computational complexity is polynomial in $|S|$ and $|A|$. We then consider a setting often encountered in practice, where the state space of the MDP is too large to allow for exact solutions. By approximating the state-action occupancy measures with a linear architecture of dimension $d\ll|S|$, we propose a modified algorithm with a computational complexity polynomial in $d$ and independent of $|S|$. We also prove a regret bound for this modified algorithm, which to the best of our knowledge, is the first $\tilde{O}(\sqrt{T})$ regret bound in the large-scale MDP setting with adversarially changing rewards.

markov decision process, name change, scale markov decision process, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Response to the paper " Large Scale Markov Decision Processes with Changing Rewards "

Neural Information Processing SystemsOct-2-2025, 19:42:39 GMT

We will include a more detailed discussion to motivate our methodology.

artificial intelligence, machine learning, scale markov decision process, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Add feedback

Reviews: Large Scale Markov Decision Processes with Changing Rewards

Neural Information Processing SystemsJan-24-2025, 00:53:47 GMT

I still feel that the work would be greatly improved by adding numerical experiments. In particular, the authors refer to a specific setting called'online MDP', where the dynamics, that is, the transition probabilities, are known while the reward is not. Regret minimization then refers to the idea to minimize the regret'' given that rewards could be chosen/observed in an adversarial manner. The authors start with a (rather technical) introduction, pose related work, and explain the main ideas based on concise preliminaries. Afterwards, an extension to large state spaces by using approximate occupancy measures and thereby avoiding concrete state-mappings is provided.

international conference, scale markov decision process, transition probability, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Reviews: Large Scale Markov Decision Processes with Changing Rewards

Neural Information Processing SystemsJan-24-2025, 00:53:36 GMT

The paper contributes new algorithmic ideas and theoretical results for regret minimization in Markov Decision Processes with known transition kernels but arbitrary cost functions. The reviewers broadly agree that the theoretical and algorithmic techniques introduced by the paper -- using the FTRL online learning idea and the extension to large MDPs via linear function approximation -- are novel, and thus the paper deserves to be published; however, the known-MDP-unknown-cost setting may be somewhat narrow in its applicability in practice.

markov decision process, scale markov decision process

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback

Large Scale Markov Decision Processes with Changing Rewards

Neural Information Processing SystemsOct-10-2024, 02:51:38 GMT

We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves a regret bound of O( \sqrt{\tau (\ln S \ln A)T}\ln(T)), where S is the state space, A is the action space, \tau is the mixing time of the MDP, and T is the number of periods. The algorithm's computational complexity is polynomial in S and A . We then consider a setting often encountered in practice, where the state space of the MDP is too large to allow for exact solutions. By approximating the state-action occupancy measures with a linear architecture of dimension d\ll S, we propose a modified algorithm with a computational complexity polynomial in d and independent of S .

algorithm, markov decision process, scale markov decision process, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)

Add feedback

Large Scale Markov Decision Processes with Changing Rewards

Cardoso, Adrian Rivera, Wang, He, Xu, Huan

Neural Information Processing SystemsMar-18-2020, 21:17:29 GMT

We consider Markov Decision Processes (MDPs) where the rewards are unknown and may change in an adversarial manner. We provide an algorithm that achieves a regret bound of $O( \sqrt{\tau (\ln S \ln A)T}\ln(T))$, where $S$ is the state space, $A$ is the action space, $\tau$ is the mixing time of the MDP, and $T$ is the number of periods. The algorithm's computational complexity is polynomial in $ S $ and $ A $. We then consider a setting often encountered in practice, where the state space of the MDP is too large to allow for exact solutions. By approximating the state-action occupancy measures with a linear architecture of dimension $d\ll S $, we propose a modified algorithm with a computational complexity polynomial in $d$ and independent of $ S $. We also prove a regret bound for this modified algorithm, which to the best of our knowledge, is the first $\tilde{O}(\sqrt{T})$ regret bound in the large-scale MDP setting with adversarially changing rewards.

algorithm, markov decision process, scale markov decision process, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)

Add feedback