Markov Models
28f699175783a2c828ae74d53dd3da20-Paper-Conference.pdf
Recent years have seen embodied visual navigation advance in two distinct directions: (i) in equipping the AI agent to follow natural language instructions, and (ii) in making the navigable world multimodal, e.g., audio-visual navigation. However, the real world is not only multimodal, but also often complex, and thus in spite of these advances, agents still need to understand the uncertainty in their actions and seek instructions to navigate.
Surprise Minimizing Multi-Agent Learning with Energy-based Models
Multi-Agent Reinforcement Learning (MARL) has demonstrated significant success by virtue of collaboration across agents. Recent work, on the other hand, introduces surprise which quantifies the degree of change in an agent's environment. Surprise-based learning has received significant attention in the case of single-agent entropic settings but remains an open problem for fast-paced dynamics in multi-agent scenarios. A potential alternative to address surprise may be realized through the lens of free-energy minimization. We explore surprise minimization in multi-agent learning by utilizing the free energy across all agents in a multi-agent system. A temporal Energy-Based Model (EBM) represents an estimate of surprise which is minimized over the joint agent distribution. Our formulation of the EBM is theoretically akin to the minimum conjugate entropy objective and highlights suitable convergence towards minimum surprising states.
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management Anonymous Author(s) Affiliation Address email
Reinforcement learning (RL) has shown great promise for developing dialogue1 management (DM) agents that are non-myopic, conduct rich conversations, and2 maximize overall user satisfaction. Despite recent developments in RL and lan-3 guage models (LMs), using RL to power conversational chatbots remains challeng-4 ing, in part because RL requires online exploration to learn effectively, whereas5 collecting novel human-bot interactions can be expensive and unsafe. This issue is6 exacerbated by the combinatorial action spaces facing these algorithms, as most7 LM agents generate responses at the word level. We develop a variety of RL algo-8 rithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert9 Language Models (MoE-LMs)--models that capture diverse semantics, generate10 utterances reflecting different intents, and are amenable for multi-turn DM. By11 exploiting MoE-LM structure, our methods significantly reduce the size of the12 action space and improve the efficacy of RL-based DM.
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy µ. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities.
Markov locality and relating it to p locality
To gain intuition for how p-locality functions, we will introduce another notion of locality, called Markov locality, which will use the language of Markov blankets. We will prove that under relatively relaxed conditions p-locality and Markov locality are equivalent. This will allow us to relate the notion of locality to various graph structures commonly used to represent probability distributions, and will be a key step in proving Properties 2.1 and 2.2. We start by defining the Markov boundary, M(X,S), of a random variable X contained in a set of random variables S, as a minimal set such that p(X|S) = p(X|M(X,S)). The Markov boundary defines a minimal set of variables such that, conditioned on these variables, conditioning on no additional random variables in S changes the probability of X [39]. Similarly, we define the Markov blanket, M(X,S) for X in S as any set of variables such that conditioning on M(X,S), makes X conditionally independent from all other variables [39]. In this way, the Markov boundary is a Markov blanket but not all blankets are boundaries. Markov locality: Given probability distribution p(Z) and function f: RNX+NΘ RNΘ, the update function f(Z) is Markov-local with respect to the distribution p over Z if and only if k: Z Ωs.t. AMarkov boundary can be thought of as the set of variables that'locally' communicate with the parameter Θk, thus providing a natural measure of locality. Importantly, for Markov-locality to be of use, we would like the Markov boundaries of random variables in the model of interest to be unique.