AITopics | dynamic programming

Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call $h$-RTDP, that replaces the 1-step greedy policy with a $h$-step lookahead policy. We analyze $h$-RTDP in its exact form and establish that increasing the lookahead horizon, $h$, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning. We then analyze the performance of $h$-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of $h$-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.

artificial intelligence, name change, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Learning Chordal Markov Networks via Branch and Bound

Kari Rantanen, Antti Hyttinen, Matti Järvisalo

Neural Information Processing SystemsNov-21-2025, 13:26:18 GMT

This problem, chordal Markov network structure learning (CMSL), is computationally notoriously challenging; e.g., finding a maximum likelihood chordal Markov network

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Add feedback

5fd0245f6c9ddbdf3eff0f505975b6a7-Paper.pdf

Neural Information Processing SystemsNov-20-2025, 16:48:17 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Foundations of Multivariate Distributional Reinforcement Learning

Neural Information Processing SystemsNov-20-2025, 02:42:29 GMT

In general, research in distributional reinforcement learning has focused on the classical setting of a scalar reward function.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Quantifying Skill and Chance: A Unified Framework for the Geometry of Games

Silver, David H.

arXiv.org Artificial IntelligenceNov-18-2025

We introduce a quantitative framework for separating skill and chance in games by modeling them as complementary sources of control over stochastic decision trees. We define the Skill-Luck Index S(G) in [-1, 1] by decomposing game outcomes into skill leverage K and luck leverage L. Applying this to 30 games reveals a continuum from pure chance (coin toss, S = -1) through mixed domains such as backgammon (S = 0, Sigma = 1.20) to pure skill (chess, S = +1, Sigma = 0). Poker exhibits moderate skill dominance (S = 0.33) with K = 0.40 +/- 0.03 and Sigma = 0.80. We further introduce volatility Sigma to quantify outcome uncertainty over successive turns. The framework extends to general stochastic decision systems, enabling principled comparisons of player influence, game balance, and predictive stability, with applications to game design, AI evaluation, and risk assessment.

artificial intelligence, chance node, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.11611

Country:

North America > United States (0.14)
Europe > Netherlands > Limburg > Maastricht (0.04)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

CoS: Towards Optimal Event Scheduling via Chain-of-Scheduling

Zhao, Yiming, Tang, Jiwei, Di, Shimin, Zheng, Libin, Yu, Jianxing, Yin, Jian

arXiv.org Artificial IntelligenceNov-18-2025

Recommending event schedules is a key issue in Event-based Social Networks (EBSNs) in order to maintain user activity. An effective recommendation is required to maximize the user's preference, subjecting to both time and geographical constraints. Existing methods face an inherent trade-off among efficiency, effectiveness, and generalization, due to the NP-hard nature of the problem. This paper proposes the Chain-of-Scheduling (CoS) framework, which activates the event scheduling capability of Large Language Models (LLMs) through a guided, efficient scheduling process. CoS enhances LLM by formulating the schedule task into three atomic stages, i.e., exploration, verification and integration. Then we enable the LLMs to generate CoS autonomously via Knowledge Distillation (KD). Experimental results show that CoS achieves near-theoretical optimal effectiveness with high efficiency on three real-world datasets in a interpretable manner. Moreover, it demonstrates strong zero-shot learning ability on out-of-domain data.

large language model, machine learning, utility score, (20 more...)

arXiv.org Artificial Intelligence

2511.12913

Country: