Goto

Collaborating Authors

 Markov Models


Review for NeurIPS paper: Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information

Neural Information Processing Systems

The authors did a good jump of addressing reviewer concerns in the response. There were some lingering concerns about whether the authors had picked the best compare-to choices for their experiments. Additional experiments and/or more careful justification for the choices made would always help. I would recommend that the authors take the reviewers' comments into account in preparing the final version of the paper.


Reviews: Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Neural Information Processing Systems

Two out of three reviewers appreciated the contributions of this paper, with one expert reviewer praising almost every aspect of the paper. On the negative side, one reviewer took issue with the proposed setting, highlighting that the utility of the proposed objective function is somewhat dubious in the general context of multi-objective decision making. I agree with this reviewer in that having "multi-objective" in the title of the paper may set the wrong expectations for some readers, and I suggest that the authors consider changing the title of the paper for its final version to avoid such misunderstandings. Furthermore, the final version should discuss the relationship between this paper and the very recent work of Rosenberg and Mansour (2019) that studies essentially the same problem in episodic MDPs. Other than these concerns, the paper is worthy of being published without major changes.


Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) enables an intelligent agent to optimise its performance in a task by continuously taking action from an observed state and receiving a feedback from the environment in form of rewards. RL typically uses tables or linear approximators to map state-action tuples that maximises the reward. Combining RL with deep neural networks (DRL) significantly increases its scalability and enables it to address more complex problems than before. However, DRL also inherits downsides from both RL and deep learning. Despite DRL improves generalisation across similar state-action pairs when compared to simpler RL policy representations like tabular methods, it still requires the agent to adequately explore the state-action space. Additionally, deep methods require more training data, with the volume of data escalating with the complexity and size of the neural network. As a result, deep RL requires a long time to collect enough agent-environment samples and to successfully learn the underlying policy. Furthermore, often even a slight alteration to the task invalidates any previous acquired knowledge. To address these shortcomings, Transfer Learning (TL) has been introduced, which enables the use of external knowledge from other tasks or agents to enhance a learning process. The goal of TL is to reduce the learning complexity for an agent dealing with an unfamiliar task by simplifying the exploration process. This is achieved by lowering the amount of new information required by its learning model, resulting in a reduced overall convergence time...


Episodic Novelty Through Temporal Distance

arXiv.org Artificial Intelligence

Exploration in sparse reward environments remains a significant challenge in reinforcement learning, particularly in Contextual Markov Decision Processes (CMDPs), where environments differ across episodes. Existing episodic intrinsic motivation methods for CMDPs primarily rely on count-based approaches, which are ineffective in large state spaces, or on similarity-based methods that lack appropriate metrics for state comparison. To address these shortcomings, we propose Episodic Novelty Through Temporal Distance (ETD), a novel approach that introduces temporal distance as a robust metric for state similarity and intrinsic reward computation. By employing contrastive learning, ETD accurately estimates temporal distances and derives intrinsic rewards based on the novelty of states within the current episode. Extensive experiments on various benchmark tasks demonstrate that ETD significantly outperforms state-of-the-art methods, highlighting its effectiveness in enhancing exploration in sparse reward CMDPs.


Formal Verification of Markov Processes with Learned Parameters

arXiv.org Artificial Intelligence

We introduce the problem of formally verifying properties of Markov processes where the parameters are the output of machine learning models. Our formulation is general and solves a wide range of problems, including verifying properties of probabilistic programs that use machine learning, and subgroup analysis in healthcare modeling. We show that for a broad class of machine learning models, including linear models, tree-based models, and neural networks, verifying properties of Markov chains like reachability, hitting time, and total reward can be formulated as a bilinear program. We develop a decomposition and bound propagation scheme for solving the bilinear program and show through computational experiments that our method solves the problem to global optimality up to 100x faster than state-of-the-art solvers. We also release $\texttt{markovml}$, an open-source tool for building Markov processes, integrating pretrained machine learning models, and verifying their properties, available at https://github.com/mmaaz-git/markovml.


Reviews: Markov Random Fields for Collaborative Filtering

Neural Information Processing Systems

The paper presents a novel method for recommendation with collaborative filtering based on Markov Random Fields (MRF). Starting from a general approach that regresses the full graph of items, the paper shows that a valid approximation can be obtained by proceeding with subgraphs that represent Markov blankets of an initial set of items. This approach yields significant computing gains, while yielding better recommendation performance compared to the state-of-the-art represented here by variational auto-encoders. As a general comment, I am wondering whether taking into account the popularity bias makes sense in the approach and if the authors thought about it. The claims are well supported by theoretical analysis.


Reviews: Markov Random Fields for Collaborative Filtering

Neural Information Processing Systems

Reviewers were initially quite favorable with respect to this paper and your response lifted some remaining doubts (especially from Reviewer #1). I am happy to recommend acceptance, congratulations! I would recommend that you take the reviewer comments into account to prepare a camera-ready version. In particular, it seems to be important to incorporate some of the discussion in bullets 1 and 2 in your response (regarding Mult-VAE and the high-level summary or pseudocode).


Reviews: Sampling Networks and Aggregate Simulation for Online POMDP Planning

Neural Information Processing Systems

Author feedback: I thank the authors for the feedback. The feedback was of high quality and satisfied my concerns. I suggest that, perhaps a compressed version, of "Explaining limitations of our work" from the author feedback, which I enjoyed reading, will be added to the final version of the paper. The paper "Sampling Networks and Aggregate Simulation for Online POMDP Planning" proposes a new solution to computing policies for large POMDP problems that is based on factorizing the belief distribution using a mean field approximation during planning and execution and extending aggregate simulation to POMDPs. In short, the proposed POMDP planner projects factorized beliefs forward in time forming at the same time a computational graph and then computes gradients backwards in time over the graph to improve the policy.


Reviews: Sampling Networks and Aggregate Simulation for Online POMDP Planning

Neural Information Processing Systems

All reviewers appreciate a practical approach to tackle POMDP in large state and observation space with factorized belief and aggregated simulation. Reviewers had some concern regarding the limitation of the work by the factorization assumption, but these concerns are addressed in author feedback. Reviewers are particularly happy about the quality of the rebuttal and encourage authors to incorporate the discussion of limitation of the algorithm in final draft.


Reviews: Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning

Neural Information Processing Systems

UPDATE: I have read the authors response and increased my score. Specifically, the authors fixed my understanding of Property 1 and properly framed the relaxation of the problem in Section 5. Please include similar clarifications in the final work. There was also a lot of discussion among the reviewers about how the paper relates to the Robust MDP literature, which needs to be covered better in the current work. Papers such as "Reinforcement Learning in Robust Markov Decision Processes" and "Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions" were brought up by others and seem applicable in the current setting and could be empirical competitors to RATS. I very much like the constraints used to study planning in non-stationary environments in this paper and the min-max inspired RATS algorithm seems like an appropriate game theoretic approach.