Goto

Collaborating Authors

 model-free method







Reviews: Planning with Goal-Conditioned Policies

Neural Information Processing Systems

Post rebuttal: My suggestions/comments were not addressed in the rebuttal, so I keep my score as is. Others have proposed this type of two step optimization where one first learns a compact representation with a VAE on randomly collected samples, then use various RL or planning methods on the representation. However, this doesn't work well for high dimensional spaces where random collection of data for learning the representation space does not give enough samples -- especially from the optimal policy. This work doesn't address this issue, by only evaluating on environments with very small state spaces, where random sampling to train the VAE is feasible. Originality: The idea of planning using TDMs over a latent representation is novel, and a promising direction for goal-directed planning in high-dimensional observation spaces.


Review for NeurIPS paper: Model-based Adversarial Meta-Reinforcement Learning

Neural Information Processing Systems

Additional Feedback: After reading the other reviews and the authors' rebuttal, I have increased my score to 7. The additional experiments are greatly appreciated, but I think more details should be provided for them: e.g. I feel that if the policy has all the necessary information and is trained with a model-free approach, it should be able to obtain comparable or better result than a model-based approach (with much worse sample complexity, of course). That being said, the comparison between model-based and model-free methods is not the focus of the work and the experiments with model-based baselines do show good results. I think the paper presents an interesting idea for improving the robustness of model-based rl method to different reward functions. I have a few questions regarding the details of the algorithm, as listed below.


On Model-Free Re-ranking for Visual Place Recognition with Deep Learned Local Features

Pivoňka, Tomáš, Přeučil, Libor

arXiv.org Artificial Intelligence

Re-ranking is the second stage of a visual place recognition task, in which the system chooses the best-matching images from a pre-selected subset of candidates. Model-free approaches compute the image pair similarity based on a spatial comparison of corresponding local visual features, eliminating the need for computationally expensive estimation of a model describing transformation between images. The article focuses on model-free re-ranking based on standard local visual features and their applicability in long-term autonomy systems. It introduces three new model-free re-ranking methods that were designed primarily for deep-learned local visual features. These features evince high robustness to various appearance changes, which stands as a crucial property for use with long-term autonomy systems. All the introduced methods were employed in a new visual place recognition system together with the D2-net feature detector (Dusmanu, 2019) and experimentally tested with diverse, challenging public datasets. The obtained results are on par with current state-of-the-art methods, affirming that model-free approaches are a viable and worthwhile path for long-term visual place recognition.


Reviews: Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Neural Information Processing Systems

This paper describes a model-based reinforcement learning approach which is applied on 4 of the continuous control Mujoco tasks. The approach incorporates uncertainty in the forward dynamics model in two ways: by predicting a Gaussian distribution over future states, rather than a single point, and by training an ensemble of models using different subsets of the agent's experience. As a controller, the authors use the CEM method to generate action sequences, which are then used to generate state trajectories using the stochastic forward dynamics model. Reward sums are computed for each of the action-conditional trajectories, and the action corresponding to the highest predicted reward is executed. This is thus a form of model-predictive control. In their experiments, the authors show that their method is able to match the performance of SOTA model-free approaches using many fewer environment interactions, i.e. with improved sample complexity, for 3 out of 4 tasks.


ProSpec RL: Plan Ahead, then Execute

Liu, Liangliang, Guan, Yi, Wang, BoRan, Shen, Rujia, Lin, Yi, Kong, Chaoran, Yan, Lian, Jiang, Jingchi

arXiv.org Artificial Intelligence

Imagining potential outcomes of actions before execution helps agents make more informed decisions, a prospective thinking ability fundamental to human cognition. However, mainstream model-free Reinforcement Learning (RL) methods lack the ability to proactively envision future scenarios, plan, and guide strategies. These methods typically rely on trial and error to adjust policy functions, aiming to maximize cumulative rewards or long-term value, even if such high-reward decisions place the environment in extremely dangerous states. To address this, we propose the Prospective (ProSpec) RL method, which makes higher-value, lower-risk optimal decisions by imagining future n-stream trajectories. Specifically, ProSpec employs a dynamic model to predict future states (termed "imagined states") based on the current state and a series of sampled actions. Furthermore, we integrate the concept of Model Predictive Control and introduce a cycle consistency constraint that allows the agent to evaluate and select the optimal actions from these trajectories. Moreover, ProSpec employs cycle consistency to mitigate two fundamental issues in RL: augmenting state reversibility to avoid irreversible events (low risk) and augmenting actions to generate numerous virtual trajectories, thereby improving data efficiency. We validated the effectiveness of our method on the DMControl benchmarks, where our approach achieved significant performance improvements. Code will be open-sourced upon acceptance.