Maximum Entropy
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Asia > China > Beijing > Beijing (0.05)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.42)
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
Zhan, Guojian, Wang, Likun, Wang, Pengcheng, Zhang, Feihong, Duan, Jingliang, Tomizuka, Masayoshi, Li, Shengbo Eben
Maximum entropy has become a mainstream off-policy reinforcement learning (RL) framework for balancing exploitation and exploration. However, two bottlenecks still limit further performance improvement: (1) non-stationary Q-value estimation caused by jointly injecting entropy and updating its weighting parameter, i.e., temperature; and (2) short-sighted local entropy tuning that adjusts temperature only according to the current single-step entropy, without considering the effect of cumulative entropy over time. In this paper, we extends maximum entropy framework by proposing a trajectory entropy-constrained reinforcement learning (TECRL) framework to address these two challenges. Within this framework, we first separately learn two Q-functions, one associated with reward and the other with entropy, ensuring clean and stable value targets unaffected by temperature updates. Then, the dedicated entropy Q-function, explicitly quantifying the expected cumulative entropy, enables us to enforce a trajectory entropy constraint and consequently control the policy long-term stochasticity. Building on this TECRL framework, we develop a practical off-policy algorithm, DSAC-E, by extending the state-of-the-art distributional soft actor-critic with three refinements (DSAC-T). Empirical results on the OpenAI Gym benchmark demonstrate that our DSAC-E can achieve higher returns and better stability.
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.84)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.41)
Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning
In Distributional Reinforcement Learning (D-RL) [Bellemare et al., 2023], an agent aims to estimate Sutton and Barto, 2018], where the objective is to predict the expected return only. In Section 3, we answer this methodological question, showing that it is possible to reformulate Policy Evaluation in a distributional setting so that its performance index is explicitly intertwined with the representation of the (state or action) spaces.
- Europe > Italy > Lombardy > Milan (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.42)
- North America > Canada > Alberta (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.43)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.42)
Deriving the Scaled-Dot-Function via Maximum Likelihood Estimation and Maximum Entropy Approach
In this paper, we present a maximum likelihood estimation approach to determine the value vector in transformer models. We model the sequence of value vectors, key vectors, and the query vector as a sequence of Gaussian distributions. The variance in each Gaussian distribution depends on the time step, the corresponding key vector, and the query vector. The mean value in each Gaussian distribution depends on the time step, and the corresponding value vector. This analysis may offer a new explanation of the scaled-dot-product function or softmax function used in transformer architectures [1]. Another explanation, inspired by [4], is based on the maximum entropy approach in natural language processing [5]. In this approach, a query vector and key vectors are used to derive the feature functions for the maximum entropy model.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
Hierarchical Maximum Entropy via the Renormalization Group
Hierarchical structures, which include multiple levels, are prevalent in statistical and machine-learning models as well as physical systems. Extending the foundational result that the maximum entropy distribution under mean constraints is given by the exponential Gibbs-Boltzmann form, we introduce the framework of "hierarchical maximum entropy" to address these multilevel models. We demonstrate that Pareto optimal distributions, which maximize entropies across all levels of hierarchical transformations, can be obtained via renormalization-group procedures from theoretical physics. This is achieved by formulating multilevel extensions of the Gibbs variational principle and the Donsker-Varadhan variational representation of entropy. Moreover, we explore settings with hierarchical invariances that significantly simplify the renormalization-group procedures, enhancing computational efficiency: quadratic modular loss functions, logarithmic loss functions, and nearest-neighbor loss functions. This is accomplished through the introduction of the concept of parameter flows, which serves as an analog to renormalization flows in renormalization group theory. This work connects ideas from probability theory, information theory, and statistical mechanics.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Sweden (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)