AITopics | Markov Models

ProtoX: Explaining a Reinforcement Learning Agent via Prototyping

Neural Information Processing SystemsAug-17-2025, 17:05:01 GMT

We conduct various experiments to test ProtoX.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Iowa > Johnson County > Iowa City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

A Classification

Neural Information Processing SystemsAug-17-2025, 16:05:02 GMT

The RL image classification environment consists of a dataset of labelled images. For the variant labelled "Adaptive", we train a classifier In this section, we will derive the optimal memoryless policy. M: it receives the highest expected test-time return amongst all possible policies. This proposition follows directly from the definition of the epistemic POMDP . In both MDPs, the reward for the "stay" action is always zero.

artificial intelligence, epistemic pomdp, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.38)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Add feedback

a9b3d7f65eebb083e5c7f8cf10e52528-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-17-2025, 13:04:02 GMT

machine learning, reinforcement, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
(4 more...)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

fdc42b6b0ee16a2f866281508ef56730-Supplemental.pdf

Neural Information Processing SystemsAug-17-2025, 10:41:48 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Neural Information Processing SystemsAug-17-2025, 10:41:36 GMT

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility. We propose two provably efficient model-free algorithms, Risk-Sensitive V alue Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both risk-seeking and risk-averse modes of exploration.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

cd755a6c6b699f3262bcc2aa46ab507e-Supplemental.pdf

Neural Information Processing SystemsAug-17-2025, 10:40:41 GMT

artificial intelligence, machine learning, probability, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Data Science (0.67)

Add feedback

cd755a6c6b699f3262bcc2aa46ab507e-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 10:40:37 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Dr Jekyll and Mr Hyde: The Strange Case of Off-Policy Policy Updates Romain Laroche Microsoft Research Montréal, Canada Rémi T achet des Combes Microsoft Research Montréal, Canada

Neural Information Processing SystemsAug-17-2025, 10:19:26 GMT

The policy gradient theorem states that the policy should only be updated in states that are visited by the current policy, which leads to insufficient planning in the off-policy states, and thus to convergence to suboptimal policies. We tackle this planning issue by extending the policy gradient theory to policy updates with respect to any state density. Under these generalized policy updates, we show convergence to optimality under a necessary and sufficient condition on the updates' state densities, and thereby solve the aforementioned planning issue. We also prove asymptotic convergence rates that significantly improve those in the policy gradient literature. To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (J&H), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores. J&H's independent policies allow to record two separate replay buffers: one on-policy (Dr Jekyll's) and one off-policy (Mr Hyde's), and therefore to update J&H's models with a mixture of on-policy and off-policy updates. More than an algorithm, J&H defines principles for actor-critic algorithms to satisfy the requirements we identify in our analysis. We extensively test on finite MDPs where J&H demonstrates a superior ability to recover from converging to a suboptimal policy without impairing its speed of convergence. We also implement a deep version of the algorithm and test it on a simple problem where it shows promising results.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.76)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(2 more...)

Add feedback

a57483b394a3654f4317051e4ce3b2b8-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 09:34:37 GMT

We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games. Our work serves as an important initial step towards understanding offline multi-agent reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Greater London > London (0.14)

Genre: Research Report (0.94)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes

Neural Information Processing SystemsAug-17-2025, 07:36:34 GMT

In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

Neural Information Processing Systems

Country: