value propagation
Probabilistic Attention for Interactive Segmentation
We provide a probabilistic interpretation of attention and show that the standard dotproduct attention in transformers is a special case of Maximum APosteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g., the semantic category of some pixels, and we need for this new information to propagate to other tokens in a principled manner. We illustrate the approach on an interactive semantic segmentation task in which annotators and models collaborate online to improve annotation efficiency. Using standard benchmarks, we observe that key adaptation boosts model performance ( 10% mIoU) in the low feedback regime and value propagation improves model responsiveness in the high feedback regime.
Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning
We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume each agent is located at a node of a communication network and can exchange information only with its neighbors. Using softmax temporal consistency, we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named {\em value propagation}. We prove a non-asymptotic convergence rate of $\mathcal{O}(1/T)$ with nonlinear function approximation. To the best of our knowledge, it is the first MARL algorithm with a convergence guarantee in the control, off-policy, non-linear function approximation, fully decentralized setting.
Reviews: Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning
This paper tackles the problem of decentralized learning in multi-agent environments. While many recent approaches use a combination of centralized learning and decentralized execution, the decentralized learning paradigm is motivated by scenarios where a centralized agent (e.g. a value function) may be too expensive to use, or may have undesirable privacy implications. However, previous decentralized learning approaches haven't been very effective for multi-agent problems. The paper proposes a new algorithm, value propagation, and prove that it converges in the non-linear function approximation case. To my knowledge, the value propagation algorithm is novel and interesting.