AITopics | sokoban

Collaborating Authors

sokoban

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

08f0efebb1c51aada9430a089a2050cc-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 13:55:04 GMT

artificial intelligence, experiment, hyperparameter, (14 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

W(leaf,i) r+ γ V(s0) s env.RESET() solution [ ].List of actions N(leaf,i) 1 for 1 Lp do Q(leaf,i) W(leaf,i) actions PLANNER(s) function UPDATE(path, leaf)

Neural Information Processing SystemsApr-24-2026, 11:50:34 GMT

A.1 MCTS-kSubS algorithm In Algorithm 4 we present a general MCTS solver based on AlphaZero. Solver repeatedly queries the planner for a list of actions and executes them one by one. Baseline planner returns only a single action at a time, whereas MCTS-kSubS gives around kactions - to reach the desired subgoal (number of actions depends on a subgoal distance, which not always equals k in practice). MCTS-kSubS operates on a high-level subgoal graph: nodes are subgoals proposed by the generator (see Algorithm 3) and edges - lists of actions informing how to move from one subgoal to another (computed by the low-level conditional policy in Algorithm 2). The graph structure is represented by treevariable. For every subgoal, it keeps up to C3 best nearby subgoals (according to generator scores) along with a mentioned list of actions and sum of rewards to obtain while moving from the parent to the child subgoal. Most of MCTS implementation is shared between MCTS-kSubS and AlphaZero baseline, as we can treat the behavioral-cloning policy as a subgoal generator with k = 1. MCTS-kSubS and the baseline are encapsulated in GEN_CHILDREN function (Algorithms 5 and 6).

artificial intelligence, machine learning, subgoal, (17 more...)

Neural Information Processing Systems

Genre: Workflow (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.30)

Add feedback

Thinker: Learning to Plan and Act

Stephen Chung, Ivan Anokhin, David Krueger

Neural Information Processing SystemsFeb-11-2026, 07:13:39 GMT

We demonstrate the algorithm's effectiveness

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(4 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (0.67)
Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

A Potential Negative Societal Impacts

Neural Information Processing SystemsFeb-11-2026, 06:28:03 GMT

We have not trained our models with sensitive or private data, and we emphasize that our model's direct L( n) other than the constant one as long as g (n) and l ( n) are positively correlated. The results for the baselines AdaSubS, kSubS, BC, CQL, DT, and HIPS with learned models were copied from [18]. The total number of GPU hours used on this work was approximately 7,500. We used 6 CPU workers (AMD Trento) per GPU. In the latter case, completeness cannot be guaranteed.

artificial intelligence, expansion, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Social Sector (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

2051bd70fc110a2208bdbd4a743e7f79-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 18:34:30 GMT

In recent years, we have witnessed tremendous progress in deep reinforcement learning(RL)fortaskssuchasGo,Chess,videogames,androbotcontrol.

machine learning, reinforcement learning, rl agent, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Leisure & Entertainment > Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

05d8cccb5f47e5072f0a05b5f514941a-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 07:56:41 GMT

sokoban, subgoal, subgoal generator, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Poland > Masovia Province > Warsaw (0.05)
(13 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(3 more...)

Add feedback

Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

Shu, Bao, Cai, Yan, Sun, Jianjian, Han, Chunrui, Yu, En, Zhao, Liang, Hu, Jingcheng, Zhang, Yinmin, Lv, Haoran, Peng, Yuang, Ge, Zheng, Zhang, Xiangyu, Jiang, Daxin, Yue, Xiangyu

arXiv.org Artificial IntelligenceDec-1-2025

Developing robust world model reasoning is crucial for large language model (LLM) agents to plan and interact in complex environments. While multi-turn interaction offers a superior understanding of environmental dynamics via authentic feedback, current approaches often impose a rigid reasoning process, which constrains the model's active learning, ultimately hindering efficient world model reasoning. T o address these issues, we explore world-model internalization through efficient interaction and active reasoning (WMAct), which liberates the model from structured reasoning--allowing the model to shape thinking directly through its doing--and achieves effective and efficient world model reasoning with two key mechanisms: (1) a reward rescaling mechanism adjusting outcome reward based on action efficacy to incentivize redundancy reduction and purposeful interaction; (2) an interaction frequency annealing strategy to progressively reduce the maximum allowed interaction turns, which compels the model to condense its learning and internalize environmental dynamics rather than over-relying on environmental cues. Our experiments on Sokoban, Maze, and T axi show that WMAct yields effective world model reasoning capable of resolving tasks in a single turn that previously required multiple interactions and fosters strong transferability to complex environments, improving performance on a suite of reasoning benchmarks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.23476

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.46)
Transportation > Passenger (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Imagination-Augmented Agents for Deep Reinforcement Learning

Sébastien Racanière, Theophane Weber, David Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra

Neural Information Processing SystemsNov-21-2025, 12:02:28 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Internalizing World Models via Self-Play Finetuning for Agentic RL

Chen, Shiqi, Zhu, Tongyao, Wang, Zian, Zhang, Jinghan, Wang, Kangrui, Gao, Siyang, Xiao, Teng, Teh, Yee Whye, He, Junxian, Li, Manling

arXiv.org Artificial IntelligenceOct-20-2025

Large Language Models (LLMs) as agents often struggle in out-of-distribution (OOD) scenarios. Real-world environments are complex and dynamic, governed by task-specific rules and stochasticity, which makes it difficult for LLMs to ground their internal knowledge in those dynamics. Under such OOD conditions, vanilla RL training often fails to scale; we observe Pass@k--the probability that at least one of (k) sampled trajectories succeeds--drops markedly across training steps, indicating brittle exploration and limited generalization. Inspired by model-based reinforcement learning, we hypothesize that equipping LLM agents with an internal world model can better align reasoning with environmental dynamics and improve decision-making. We show how to encode this world model by decomposing it into two components: state representation and transition modeling. Building on this, we introduce SPA, a simple reinforcement learning framework that cold-starts the policy via a Self-Play supervised finetuning (SFT) stage to learn the world model by interacting with the environment, then uses it to simulate future states prior to policy optimization. This simple initialization outperforms the online world-modeling baseline and greatly boosts the RL-based agent training performance. Experiments across diverse environments like Sokoban, FrozenLake, and Sudoku show that our approach significantly improves performance. For example, SPA boosts the Sokoban success rate from 25.6% to 59.8% and raises the FrozenLake score from 22.1% to 70.9% for the Qwen2.5-1.5B-Instruct model.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.15047

Country: Asia (0.46)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Filters

Collaborating Authors

sokoban

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

08f0efebb1c51aada9430a089a2050cc-Supplemental.pdf

W(leaf,i) r+ γ V(s0) s env.RESET() solution [ ].List of actions N(leaf,i) 1 for 1 Lp do Q(leaf,i) W(leaf,i) actions PLANNER(s) function UPDATE(path, leaf)

Thinker: Learning to Plan and Act

A Potential Negative Societal Impacts

611b896d447df43c898062358df4c114-Supplemental-Datasets_and_Benchmarks.pdf

2051bd70fc110a2208bdbd4a743e7f79-Paper.pdf

05d8cccb5f47e5072f0a05b5f514941a-Paper.pdf

Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

Imagination-Augmented Agents for Deep Reinforcement Learning

Internalizing World Models via Self-Play Finetuning for Agentic RL