Maximum Entropy
Bid Optimization using Maximum Entropy Reinforcement Learning
Liu, Mengjuan, Liu, Jinyu, Hu, Zhengning, Ge, Yuchen, Nie, Xuyun
Real-time bidding (RTB) has become a critical way of online advertising. In RTB, an advertiser can participate in bidding ad impressions to display its advertisements. The advertiser determines every impression's bidding price according to its bidding strategy. Therefore, a good bidding strategy can help advertisers improve cost efficiency. This paper focuses on optimizing a single advertiser's bidding strategy using reinforcement learning (RL) in RTB. Unfortunately, it is challenging to optimize the bidding strategy through RL at the granularity of impression due to the highly dynamic nature of the RTB environment. In this paper, we first utilize a widely accepted linear bidding function to compute every impression's base price and optimize it by a mutable adjustment factor derived from the RTB auction environment, to avoid optimizing every impression's bidding price directly. Specifically, we use the maximum entropy RL algorithm (Soft Actor-Critic) to optimize the adjustment factor generation policy at the impression-grained level. Finally, the empirical study on a public dataset demonstrates that the proposed bidding strategy has superior performance compared with the baselines.
Maximum Entropy Dueling Network Architecture
Nadali, Alireza, Ebadzadeh, Mohammad Mehdi
In recent years, there have been many deep structures for Reinforcement Learning, mainly for value function estimation and representations. These methods achieved great success in Atari 2600 domain. In this paper, we propose an improved architecture based upon Dueling Networks, in this architecture, there are two separate estimators, one approximate the state value function and the other, state advantage function. This improvement based on Maximum Entropy, shows better policy evaluation compared to the original network and other value-based architectures in Atari domain.
Maximum Entropy Weighted Independent Set Pooling for Graph Neural Networks
Nouranizadeh, Amirhossein, Matinkia, Mohammadjavad, Rahmati, Mohammad, Safabakhsh, Reza
In this paper, we propose a novel pooling layer for graph neural networks based on maximizing the mutual information between the pooled graph and the input graph. Since the maximum mutual information is difficult to compute, we employ the Shannon capacity of a graph as an inductive bias to our pooling method. More precisely, we show that the input graph to the pooling layer can be viewed as a representation of a noisy communication channel. For such a channel, sending the symbols belonging to an independent set of the graph yields a reliable and error-free transmission of information. We show that reaching the maximum mutual information is equivalent to finding a maximum weight independent set of the graph where the weights convey entropy contents. Through this communication theoretic standpoint, we provide a distinct perspective for posing the problem of graph pooling as maximizing the information transmission rate across a noisy communication channel, implemented by a graph neural network. We evaluate our method, referred to as Maximum Entropy Weighted Independent Set Pooling (MEWISPool), on graph classification tasks and the combinatorial optimization problem of the maximum independent set. Empirical results demonstrate that our method achieves the state-of-the-art and competitive results on graph classification tasks and the maximum independent set problem in several benchmark datasets.
Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning
Arriojas, Argenis, Tiomkin, Stas, Kulkarni, Rahul V.
We introduce a mapping between Maximum Entropy Reinforcement Learning (MaxEnt RL) and Markovian processes conditioned on rare events. In the long time limit, this mapping allows us to derive analytical expressions for the optimal policy, dynamics and initial state distributions for the general case of stochastic dynamics in MaxEnt RL. We find that soft-$\mathcal{Q}$ functions in MaxEnt RL can be obtained from the Perron-Frobenius eigenvalue and the corresponding left eigenvector of a regular, non-negative matrix derived from the underlying Markov Decision Process (MDP). The results derived lead to novel algorithms for model-based and model-free MaxEnt RL, which we validate by numerical simulations. The mapping established in this work opens further avenues for the application of novel analytical and computational approaches to problems in MaxEnt RL. We make our code available at: https://github.com/argearriojas/maxent-rl-mdp-scripts
Maximum entropy RL (provably) solves some robust RL problems
Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL algorithms to perform markedly worse. As we aim to scale reinforcement learning algorithms and apply them in the real world, it is increasingly important to learn policies that are robust to changes in the environment. Broadly, prior approaches to handling distribution shift in RL aim to maximize performance in either the average case or the worst case. While these methods have been successfully applied to a number of areas (e.g., self-driving cars, robot locomotion and manipulation), their success rests critically on the design of the distribution of environments.
Inductive Mutual Information Estimation: A Convex Maximum-Entropy Copula Approach
We propose a novel estimator of the mutual information between two ordinal vectors $x$ and $y$. Our approach is inductive (as opposed to deductive) in that it depends on the data generating distribution solely through some nonparametric properties revealing associations in the data, and does not require having enough data to fully characterize the true joint distributions $P_{x, y}$. Specifically, our approach consists of (i) noting that $I\left(y; x\right) = I\left(u_y; u_x\right)$ where $u_y$ and $u_x$ are the copula-uniform dual representations of $y$ and $x$ (i.e. their images under the probability integral transform), and (ii) estimating the copula entropies $h\left(u_y\right)$, $h\left(u_x\right)$ and $h\left(u_y, u_x\right)$ by solving a maximum-entropy problem over the space of copula densities under a constraint of the type $\alpha_m = E\left[\phi_m(u_y, u_x)\right]$. We prove that, so long as the constraint is feasible, this problem admits a unique solution, it is in the exponential family, and it can be learned by solving a convex optimization problem. The resulting estimator, which we denote MIND, is marginal-invariant, always non-negative, unbounded for any sample size $n$, consistent, has MSE rate $O(1/n)$, and is more data-efficient than competing approaches. Beyond mutual information estimation, we illustrate that our approach may be used to mitigate mode collapse in GANs by maximizing the entropy of the copula of fake samples, a model we refer to as Copula Entropy Regularized GAN (CER-GAN).
Maximum Entropy Reinforcement Learning with Mixture Policies
Baram, Nir, Tennenholtz, Guy, Mannor, Shie
Mixture models are an expressive hypothesis class that can approximate a rich set of policies. However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward. The entropy of a mixture model is not equal to the sum of its components, nor does it have a closed-form expression in most cases. Using such policies in MaxEnt algorithms, therefore, requires constructing a tractable approximation of the mixture entropy. In this paper, we derive a simple, low-variance mixture-entropy estimator. We show that it is closely related to the sum of marginal entropies. Equipped with our entropy estimator, we derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.
Maximum Entropy RL (Provably) Solves Some Robust RL Problems
Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL algorithms to perform markedly worse. As we aim to scale reinforcement learning algorithms and apply them in the real world, it is increasingly important to learn policies that are robust to changes in the environment. Broadly, prior approaches to handling distribution shift in RL aim to maximize performance in either the average case or the worst case. While these methods have been successfully applied to a number of areas (e.g., self-driving cars, robot locomotion and manipulation), their success rests critically on the design of the distribution of environments.
A maximum entropy model of bounded rational decision-making with prior beliefs and market feedback
Evans, Benjamin Patrick, Prokopenko, Mikhail
Bounded rationality is an important consideration stemming from the fact that agents often have limits on their processing abilities, making the assumption of perfect rationality inapplicable to many real tasks. We propose an information-theoretic approach to the inference of agent decisions under Smithian competition. The model explicitly captures the boundedness of agents (limited in their information-processing capacity) as the cost of information acquisition for expanding their prior beliefs. The expansion is measured as the Kullblack-Leibler divergence between posterior decisions and prior beliefs. When information acquisition is free, the \textit{homo economicus} agent is recovered, while in cases when information acquisition becomes costly, agents instead revert to their prior beliefs. The maximum entropy principle is used to infer least-biased decisions, based upon the notion of Smithian competition formalised within the Quantal Response Statistical Equilibrium framework. The incorporation of prior beliefs into such a framework allowed us to systematically explore the effects of prior beliefs on decision-making, in the presence of market feedback. We verified the proposed model using Australian housing market data, showing how the incorporation of prior knowledge alters the resulting agent decisions. Specifically, it allowed for the separation (and analysis) of past beliefs and utility maximisation behaviour of the agent.
Bayesian Inference: The Maximum Entropy Principle
In this article, I will explain what the maximum entropy principle is, how to apply it and why it's useful in the context of Bayesian inference. The code to reproduce the results and figures can be found in this notebook. The maximum entropy principle is a method to create probability distributions that is most consistent with a given set of assumptions and nothing more. The rest of the article will explain what this means. First, we need to a way to measure the uncertainty in a probability distribution.