AITopics | ucbexplore

We thank the reviewers for their comments and insightful reviews. 's is only logarithmic as the main dependency is w.r.t. VI algorithm for SSP was proved in [37] to converge in time quadratic w.r.t. the size of the considered state space This allows tuning the parameter online according to the desired behavior. A sketch of the proof of Thm. 1 is currently available in App. B. In case of acceptance we will use We will include additional experiments for varying L in the final version.

artificial intelligence, complexity, disco, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

81e793dc8317a3dbc3534ed3f242c418-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 09:56:32 GMT

machine learning, reinforcement learning, ucbexplore, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

81e793dc8317a3dbc3534ed3f242c418-Paper.pdf

Neural Information Processing SystemsOct-3-2025, 09:56:25 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Tarbouriech, Jean, Pirotta, Matteo, Valko, Michal, Lazaric, Alessandro

arXiv.org Machine LearningDec-29-2020

We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of $\epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps (in expectation) from a reference state $s_0$. In this paper, we introduce a novel model-based approach that interleaves discovering new states from $s_0$ and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as $\tilde{O}(L^5 S_{L+\epsilon} \Gamma_{L+\epsilon} A \epsilon^{-2})$, where $A$ is the number of actions, $S_{L+\epsilon}$ is the number of states that are incrementally reachable from $s_0$ in $L+\epsilon$ steps, and $\Gamma_{L+\epsilon}$ is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both $\epsilon$ and $L$ at the cost of an extra $\Gamma_{L+\epsilon}$ factor, which is small in most environments of interest. Furthermore, DisCo is the first algorithm that can return an $\epsilon/c_{\min}$-optimal policy for any cost-sensitive shortest-path problem defined on the $L$-reachable states with minimum cost $c_{\min}$. Finally, we report preliminary empirical results confirming our theoretical findings.

machine learning, reinforcement learning, ucbexplore, (19 more...)

arXiv.org Machine Learning

2012.14755

Country:

North America > United States (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Autonomous exploration for navigating in non-stationary CMPs

Gajane, Pratik, Ortner, Ronald, Auer, Peter, Szepesvari, Csaba

arXiv.org Machine LearningOct-18-2019

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM, and prove an upper bound on the exploration steps in terms of the number of changes.

artificial intelligence, exploration step, machine learning, (18 more...)

arXiv.org Machine Learning

1910.08446

Country: