Goto

Collaborating Authors

 olicy


Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies

Huang, Haojie, Schmeckpeper, Karl, Wang, Dian, Biza, Ondrej, Qian, Yaoyao, Liu, Haotian, Jia, Mingxi, Platt, Robert, Walters, Robin

arXiv.org Artificial Intelligence

Humans can imagine goal states during planning and perform actions to match those goals. In this work, we propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation. This transforms action inference into a local generative task. We leverage pick and place symmetries underlying the tasks in the generation process and achieve extremely high sample efficiency and generalizability to unseen configurations. Finally, we demonstrate state-of-the-art performance across various tasks on the RLbench benchmark compared with several strong baselines.


Efficient Subgraph GNNs by Learning Effective Selection Policies

Bevilacqua, Beatrice, Eliasof, Moshe, Meirom, Eli, Ribeiro, Bruno, Maron, Haggai

arXiv.org Artificial Intelligence

Subgraph GNNs are provably expressive neural architectures that learn graph representations from sets of subgraphs. Unfortunately, their applicability is hampered by the computational complexity associated with performing message passing on many subgraphs. In this paper, we consider the problem of learning to select a small subset of the large set of possible subgraphs in a data-driven fashion. We first motivate the problem by proving that there are families of WL-indistinguishable graphs for which there exist efficient subgraph selection policies: small subsets of subgraphs that can already identify all the graphs within the family. We prove that, unlike popular random policies and prior work addressing the same problem, our architecture is able to learn the efficient policies mentioned above. In essence, a Subgraph GNN first transforms an input graph into a bag of subgraphs, obtained according to a predefined generation policy. For instance, each subgraph might be generated by deleting exactly one node in the original graph, or, more generally, by marking exactly one node in the original graph, while leaving the connectivity unaltered (Papp & Wattenhofer, 2022). Then, it applies an equivariant architecture to process the bag of subgraphs, and aggregates the representations to obtain graphor node-level predictions. The popularity of Subgraph GNNs can be attributed not only to their increased expressive power compared to MPNNs but also to their remarkable empirical performance, exemplified by their success on the ZINC molecular dataset (Frasca et al., 2022; Zhang et al., 2023). Unfortunately, Subgraph GNNs are hampered by their computational cost, since they perform message-passing operations on all subgraphs within the bag.


Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems

Qu, Guannan, Wierman, Adam, Li, Na

arXiv.org Artificial Intelligence

We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor-Critic (SAC) framework that exploits the network structure and finds a localized policy that is a $O(\rho^\kappa)$-approximation of a stationary point of the objective for some $\rho\in(0,1)$, with complexity that scales with the local state-action space size of the largest $\kappa$-hop neighborhood of the network.


Dynamic Non-Bayesian Decision Making

Monderer, D., Tennenholtz, M.

Journal of Artificial Intelligence Research

The model of a non-Bayesian agent who faces a repeated game with incomplete information against Nature is an appropriate tool for modeling general agent-environment interactions. In such a model the environment state (controlled by Nature) may change arbitrarily, and the feedback/reward function is initially unknown. The agent is not Bayesian, that is he does not form a prior probability neither on the state selection strategy of Nature, nor on his reward function. A policy for the agent is a function which assigns an action to every history of observations and actions. Two basic feedback structures are considered. In one of them -- the perfect monitoring case -- the agent is able to observe the previous environment state as part of his feedback, while in the other -- the imperfect monitoring case -- all that is available to the agent is the reward obtained. Both of these settings refer to partially observable processes, where the current environment state is unknown. Our main result refers to the competitive ratio criterion in the perfect monitoring case. We prove the existence of an efficient stochastic policy that ensures that the competitive ratio is obtained at almost all stages with an arbitrarily high probability, where efficiency is measured in terms of rate of convergence. It is further shown that such an optimal policy does not exist in the imperfect monitoring case. Moreover, it is proved that in the perfect monitoring case there does not exist a deterministic policy that satisfies our long run optimality criterion. In addition, we discuss the maxmin criterion and prove that a deterministic efficient optimal strategy does exist in the imperfect monitoring case under this criterion. Finally we show that our approach to long-run optimality can be viewed as qualitative, which distinguishes it from previous work in this area.