Investigating Scale Independent UCT Exploration Factor Strategies

Schmöcker, Robin, Schnell, Christoph, Dockhorn, Alexander

Oct-27-2025–arXiv.org Artificial Intelligence

The Upper Confidence Bounds For Trees (UCT) algorithm is not agnostic to the reward scale of the game it is applied to. For zero-sum games with the sparse rewards of $\{-1,0,1\}$ at the end of the game, this is not a problem, but many games often feature dense rewards with hand-picked reward scales, causing a node's Q-value to span different magnitudes across different games. In this paper, we evaluate various strategies for adaptively choosing the UCT exploration constant $λ$, called $λ$-strategies, that are agnostic to the game's reward scale. These $λ$-strategies include those proposed in the literature as well as five new strategies. Given our experimental results, we recommend using one of our newly suggested $λ$-strategies, which is to choose $λ$ as $2 \cdot σ$ where $σ$ is the empirical standard deviation of all state-action pairs' Q-values of the search tree. This method outperforms existing $λ$-strategies across a wide range of tasks both in terms of a single parameter value and the peak performances obtained by optimizing all available parameters.

artificial intelligence, machine learning, planning & scheduling, (19 more...)

arXiv.org Artificial Intelligence

Oct-27-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.92)

Genre:
- Research Report (0.49)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Planning & Scheduling (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found