Maximum Entropy
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning
Shi, Wenjie, Song, Shiji, Wu, Cheng
Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.
The Learning of Fuzzy Cognitive Maps With Noisy Data: A Rapid and Robust Learning Method With Maximum Entropy
Feng, Guoliang, Lu, Wei, Pedrycz, Witold, Yang, Jianhua, Liu, Xiaodong
Numerous learning methods for fuzzy cognitive maps (FCMs), such as the Hebbian-based and the population-based learning methods, have been developed for modeling and simulating dynamic systems. However, these methods are faced with several obvious limitations. Most of these models are extremely time consuming when learning the large-scale FCMs with hundreds of nodes. Furthermore, the FCMs learned by those algorithms lack robustness when the experimental data contain noise. In addition, reasonable distribution of the weights is rarely considered in these algorithms, which could result in the reduction of the performance of the resulting FCM. In this article, a straightforward, rapid, and robust learning method is proposed to learn FCMs from noisy data, especially, to learn large-scale FCMs. The crux of the proposed algorithm is to equivalently transform the learning problem of FCMs to a classic-constrained convex optimization problem in which the least-squares term ensures the robustness of the well-learned FCM and the maximum entropy term regularizes the distribution of the weights of the well-learned FCM. A series of experiments covering two frequently used activation functions (the sigmoid and hyperbolic tangent functions) are performed on both synthetic datasets with noise and real-world datasets. The experimental results show that the proposed method is rapid and robust against data containing noise and that the well-learned weights have better distribution. In addition, the FCMs learned by the proposed method also exhibit superior performance in comparison with the existing methods. Index Terms-Fuzzy cognitive maps (FCMs), maximum entropy, noisy data, rapid and robust learning.
Improving the Accuracy of Principal Component Analysis by the Maximum Entropy Method
Wan, Guihong, Maung, Crystal, Schweitzer, Haim
Classical Principal Component Analysis (PCA) approximates data in terms of projections on a small number of orthogonal vectors. There are simple procedures to efficiently compute various functions of the data from the PCA approximation. The most important function is arguably the Euclidean distance between data items, This can be used, for example, to solve the approximate nearest neighbor problem. We use random variables to model the inherent uncertainty in such approximations, and apply the Maximum Entropy Method to infer the underlying probability distribution. We propose using the expected values of distances between these random variables as improved estimates of the distance. We show by analysis and experimentally that in most cases results obtained by our method are more accurate than what is obtained by the classical approach. This improves the accuracy of a classical technique that have been used with little change for over 100 years.
Fair Distributions from Biased Samples: A Maximum Entropy Optimization Framework
Celis, L. Elisa, Keswani, Vijay, Yildiz, Ozan, Vishnoi, Nisheeth K.
One reason for the emergence of bias in AI systems is biased data -- datasets that may not be true representations of the underlying distributions -- and may over or under-represent groups with respect to protected attributes such as gender or race. We consider the problem of correcting such biases and learning distributions that are "fair", with respect to measures such as proportional representation and statistical parity, from the given samples. Our approach is based on a novel formulation of the problem of learning a fair distribution as a maximum entropy optimization problem with a given expectation vector and a prior distribution. Technically, our main contributions are: (1) a new second-order method to compute the (dual of the) maximum entropy distribution over an exponentially-sized discrete domain that turns out to be faster than previous methods, and (2) methods to construct prior distributions and expectation vectors that provably guarantee that the learned distributions satisfy a wide class of fairness criteria. Our results also come with quantitative bounds on the total variation distance between the empirical distribution obtained from the samples and the learned fair distribution. Our experimental results include testing our approach on the COMPAS dataset and showing that the fair distributions not only improve disparate impact values but when used to train classifiers only incur a small loss of accuracy.
MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning
Granziol, Diego, Ru, Binxin, Zohren, Stefan, Doing, Xiaowen, Osborne, Michael, Roberts, Stephen
Making high quality inference on large, feature rich datasets under a constrained computational budget is arguably the primary goal of the learning community. This, however, comes with significant challenges. On the one hand, the exact computation of linear algebraic quantities may be prohibitively expensive, such as that of the log determinant. On the other hand, an analytic expression for the quantity of interest may not exist at all, such as the case for the entropy of a Gaussian mixture model, and approximate methods are often both inefficient and inaccurate.
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Zhao, Rui, Sun, Xudong, Tresp, Volker
In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.
A Regularized Opponent Model with Maximum Entropy Objective
Tian, Zheng, Wen, Ying, Gong, Zhichen, Punakkath, Faiz, Zou, Shihao, Wang, Jun
In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.
Maximum Entropy Based Significance of Itemsets
We consider the problem of defining the significance of an itemset. We say that the itemset is significant if we are surprised by its frequency when compared to the frequencies of its sub-itemsets. In other words, we estimate the frequency of the itemset from the frequencies of its sub-itemsets and compute the deviation between the real value and the estimate. For the estimation we use Maximum Entropy and for measuring the deviation we use Kullback-Leibler divergence. A major advantage compared to the previous methods is that we are able to use richer models whereas the previous approaches only measure the deviation from the independence model. We show that our measure of significance goes to zero for derivable itemsets and that we can use the rank as a statistical test. Our empirical results demonstrate that for our real datasets the independence assumption is too strong but applying more flexible models leads to good results.
Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach
Roy, Proteek Chandan, Boddeti, Vishnu Naresh
Image recognition systems have demonstrated tremendous progress over the past few decades thanks, in part, to our ability of learning compact and robust representations of images. As we witness the wide spread adoption of these systems, it is imperative to consider the problem of unintended leakage of information from an image representation, which might compromise the privacy of the data owner. This paper investigates the problem of learning an image representation that minimizes such leakage of user information. We formulate the problem as an adversarial non-zero sum game of finding a good embedding function with two competing goals: to retain as much task dependent discriminative image information as possible, while simultaneously minimizing the amount of information, as measured by entropy, about other sensitive attributes of the user. We analyze the stability and convergence dynamics of the proposed formulation using tools from non-linear systems theory and compare to that of the corresponding adversarial zero-sum game formulation that optimizes likelihood as a measure of information content. Numerical experiments on UCI, Extended Yale B, CIFAR-10 and CIFAR-100 datasets indicate that our proposed approach is able to learn image representations that exhibit high task performance while mitigating leakage of predefined sensitive information.
Modeling and Optimization of Human-machine Interaction Processes via the Maximum Entropy Principle
Zheng, Jiaxiao, de Veciana, Gustavo
We propose a data-driven framework to enable the modeling and optimization of human-machine interaction processes, e.g., systems aimed at assisting humans in decision-making or learning, work-load allocation, and interactive advertising. This is a challenging problem for several reasons. First, humans' behavior is hard to model or infer, as it may reflect biases, long term memory, and sensitivity to sequencing, i.e., transience and exponential complexity in the length of the interaction. Second, due to the interactive nature of such processes, the machine policy used to engage with a human may bias possible data-driven inferences. Finally, in choosing machine policies that optimize interaction rewards, one must, on the one hand, avoid being overly sensitive to error/variability in the estimated human model, and on the other, being overly deterministic/predictable which may result in poor human 'engagement' in the interaction. To meet these challenges, we propose a robust approach, based on the maximum entropy principle, which iteratively estimates human behavior and optimizes the machine policy--Alternating Entropy-Reward Ascent (AREA) algorithm. We characterize AREA, in terms of its space and time complexity and convergence. We also provide an initial validation based on synthetic data generated by an established noisy nonlinear model for human decision-making.