AITopics | deterministic mdp

Collaborating Authors

deterministic mdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b8be628bf719550b560de8bec9456e0b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 18:24:07 GMT

machine learning, natural language, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
(2 more...)

Add feedback

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.

deterministic mdp, instance-optimal pac reinforcement learning, name change, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Why and How Auxiliary Tasks Improve JEPA Representations

Yu, Jiacan, Chen, Siyi, Liu, Mingrui, Horiuchi, Nono, Braverman, Vladimir, Xu, Zicheng, Haramati, Dan, Balestriero, Randall

arXiv.org Artificial IntelligenceOct-21-2025

Joint-Embedding Predictive Architecture (JEPA) is increasingly used for visual representation learning and as a component in model-based RL, but its behavior remains poorly understood. We provide a theoretical characterization of a simple, practical JEPA variant that has an auxiliary regression head trained jointly with latent dynamics. We prove a No Unhealthy Representation Collapse theorem: in deterministic MDPs, if training drives both the latent-transition consistency loss and the auxiliary regression loss to zero, then any pair of non-equivalent observations, i.e., those that do not have the same transition dynamics or auxiliary value, must map to distinct latent representations. Thus, the auxiliary task anchors which distinctions the representation must preserve. Controlled ablations in a counting environment corroborate the theory and show that training the JEPA model jointly with the auxiliary head generates a richer representation than training them separately. Our work indicates a path to improve JEPA encoders: training them with an auxiliary function that, together with the transition dynamics, encodes the right equivalence relations.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2509.12249

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

b8be628bf719550b560de8bec9456e0b-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 05:47:47 GMT

machine learning, natural language, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
(2 more...)

Add feedback

Efficient Computation of Blackwell Optimal Policies using Rational Functions

Mukherjee, Dibyangshu, Kalyanakrishnan, Shivaram

arXiv.org Artificial IntelligenceAug-26-2025

Markov Decision Problems (MDPs) provide a founda-tional framework for modelling sequential decision-making across diverse domains, guided by optimality criteria such as discounted and average rewards. However, these criteria have inherent limitations: discounted optimality may overly prioritise short-term rewards, while average optimality relies on strong structural assumptions. Blackwell optimality addresses these challenges, offering a robust and comprehensive criterion that ensures optimality under both discounted and average reward frameworks. Despite its theoretical appeal, existing algorithms for computing Blackwell Optimal (BO) policies are computationally expensive or hard to implement. In this paper we describe procedures for computing BO policies using an ordering of rational functions in the vicinity of 1 . We adapt state-of-the-art algorithms for deterministic and general MDPs, replacing numerical evaluations with symbolic operations on rational functions to derive bounds independent of bit complexity. For deterministic MDPs, we give the first strongly polynomial-time algorithms for computing BO policies, and for general MDPs we obtain the first subexponential-time algorithm. We further generalise several policy iteration algorithms, extending the best known upper bounds from the discounted to the Blackwell criterion.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2508.18252

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Mathematics of Computing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

ab6a2c6ee757afe43882121281f6065c-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 14:10:19 GMT

artificial intelligence, objective, trajectory, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

f5f3b8d720f34ebebceb7765e447268b-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 07:45:42 GMT

We thank all reviewers for detailed and valuable comments, and will revise the paper accordingly as described below. We thank all reviewers for pointing those out, and will do corrections in the revision. We agree with the reviewer and will change the wording in the revision. HIRO paper, goal-conditioned HRL often yields better performance than HRL with Options. E.g. all graph-based works cited in the review obtain the subgoal sequence by solving a shortest-path In the revision, we will add these discussions to the related work section.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

9 Appendix For all the following derivations, we use D

Neural Information Processing SystemsAug-15-2025, 03:29:48 GMT

Based on Lemma2, we can derive the upper-bound of our original objective: Theorem 1 (Surrogate Objective as the Divergence Upper-bound) . We provide proof with a counter-example. Based on Assumption 1, we have the following: Corollary 1. A similar strategy is adopted by [3]. Sec 3.2 to learn a value function v (s,s

ds null dad, mdp, non-injective mdp, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.47)

Add feedback