Maximum Entropy
ELEMENT: Episodic and Lifelong Exploration via Maximum Entropy
Li, Hongming, Yu, Shujian, Liu, Bin, Principe, Jose C.
This paper proposes \emph{Episodic and Lifelong Exploration via Maximum ENTropy} (ELEMENT), a novel, multiscale, intrinsically motivated reinforcement learning (RL) framework that is able to explore environments without using any extrinsic reward and transfer effectively the learned skills to downstream tasks. We advance the state of the art in three ways. First, we propose a multiscale entropy optimization to take care of the fact that previous maximum state entropy, for lifelong exploration with millions of state observations, suffers from vanishing rewards and becomes very expensive computationally across iterations. Therefore, we add an episodic maximum entropy over each episode to speedup the search further. Second, we propose a novel intrinsic reward for episodic entropy maximization named \emph{average episodic state entropy} which provides the optimal solution for a theoretical upper bound of the episodic state entropy objective. Third, to speed the lifelong entropy maximization, we propose a $k$ nearest neighbors ($k$NN) graph to organize the estimation of the entropy and updating processes that reduces the computation substantially. Our ELEMENT significantly outperforms state-of-the-art intrinsic rewards in both episodic and lifelong setups. Moreover, it can be exploited in task-agnostic pre-training, collecting data for offline reinforcement learning, etc.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Florida > Alachua County > Gainesville (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Education (0.48)
- Leisure & Entertainment > Games (0.46)
MEP-Net: Generating Solutions to Scientific Problems with Limited Knowledge by Maximum Entropy Principle
Yang, Wuyue, Peng, Liangrong, Li, Guojie, Hong, Liu
Maximum entropy principle (MEP) offers an effective and unbiased approach to inferring unknown probability distributions when faced with incomplete information, while neural networks provide the flexibility to learn complex distributions from data. This paper proposes a novel neural network architecture, the MEP-Net, which combines the MEP with neural networks to generate probability distributions from moment constraints. We also provide a comprehensive overview of the fundamentals of the maximum entropy principle, its mathematical formulations, and a rigorous justification for its applicability for non-equilibrium systems based on the large deviations principle. Through fruitful numerical experiments, we demonstrate that the MEP-Net can be particularly useful in modeling the evolution of probability distributions in biochemical reaction networks and in generating complex distributions from data.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- Overview (0.86)
- Research Report (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Maximum Entropy Hindsight Experience Replay
Crowder, Douglas C., Trappett, Matthew L., McKenzie, Darrien M., Chance, Frances S.
Hindsight experience replay (HER) is well-known to accelerate goal-based reinforcement learning (RL). While HER is generally applied to off-policy RL algorithms, we previously showed that HER can also accelerate on-policy algorithms, such as proximal policy optimization (PPO), for goal-based Predator-Prey environments. Here, we show that we can improve the previous PPO-HER algorithm by selectively applying HER in a principled manner.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > France (0.04)
- Government > Regional Government > North America Government > United States Government (0.96)
- Energy (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.43)
Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
Adversarial data augmentation has shown promise for training robust deep neural networks against unforeseen data shifts or corruptions. However, it is difficult to define heuristics to generate effective fictitious target distributions containing "hard" adversarial perturbations that are largely different from the source distribution. In this paper, we propose a novel and effective regularization term for adversarial data augmentation. We theoretically derive it from the information bottleneck principle, which results in a maximum-entropy formulation. Intuitively, this regularization term encourages perturbing the underlying source distribution to enlarge predictive uncertainty of the current model, so that the generated "hard" adversarial perturbations can improve the model robustness during training. Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.67)
A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning
The Maximum Entropy (Max-Ent) framework has been effectively employed in a variety of Reinforcement Learning (RL) tasks. In this paper, we first propose a novel Max-Ent framework for policy evaluation in a distributional RL setting, named Distributional Maximum Entropy Policy Evaluation (D-Max-Ent PE). We derive a generalization-error bound that depends on the complexity of the representation employed, showing that this framework can explicitly take into account the features used to represent the state space while evaluating a policy. Then, we exploit these favorable properties to drive the representation learning of the state space in a Structural Risk Minimization fashion. We employ state-aggregation functions as feature functions and we specialize the D-Max-Ent approach into an algorithm, named D-Max-Ent Progressive Factorization, which constructs a progressively finer-grained representation of the state space by balancing the trade-off between preserving information (bias) and reducing the effective number of states, i.e., the complexity of the representation space (variance).
Maximum Entropy Monte-Carlo Planning
We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation. To establish the effectiveness of this approach, we first investigate the single-step decision problem, stochastic softmax bandits, and show that softmax values can be estimated at an optimal convergence rate in terms of mean squared error. We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS). We prove that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.91)
Reviews: Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods
This is a nice paper, a bit of an odd match for NIPS (there are no numerical experiments, and in spite of claims of genericity and applicability to general exponential families, I remain unconvinced). The methods are elegant, though I did find the presentation a bit lacking. I would have loved a high-level detail of the proof steps and proof intuition, with pointers to precise sub-proposition statements and corresponding proofs. Right now, it is easy to get lost in the details, and what appears to me as the key moments of the proof are skimmed over quickly. For instance, lemma 3.1 deserved to be expanded upon (even the long version is a bit quick on details here) - this is especially since the GW proof technique is so elegant, it's always nice to include (even if similar to the original proof).
Reviews: Connectionist Temporal Classification with Maximum Entropy Regularization
This work presents a method for end-to-end sequence learning, and more specifically in the framework of Connectionist Temporal Classification (CTC). The paper has two main contributions: - The first is a regularization of the training of the CTC objective in order to reduce the over-confidence of the model. In order to do that, the authors propose a method based on conditional entropy. More specifically, the proposed regularization would encourages the model to explore paths that are close to the dominant one. In order to do so, they suppose that the consecutive elements of a sequence have equal spacing.
Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods
The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family has been very popular in machine learning due to its "Occam's razor" interpretation. Unfortunately, calculating the potentials in the maximumentropy distribution is intractable [BGS14]. We provide computationally efficient versions of this principle when the mean parameters are pairwise moments: we design distributions that approximately match given pairwise moments, while having entropy which is comparable to the maximum entropy distribution matching those moments. We additionally provide surprising applications of the approximate maximum entropy principle to designing provable variational methods for partition function calculations for Ising models without any assumptions on the potentials of the model. More precisely, we show that we can get approximation guarantees for the log-partition function comparable to those in the low-temperature limit, which is the setting of optimization of quadratic forms over the hypercube.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)