Maximum Entropy
Reviews: Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods
This is a nice paper, a bit of an odd match for NIPS (there are no numerical experiments, and in spite of claims of genericity and applicability to general exponential families, I remain unconvinced). The methods are elegant, though I did find the presentation a bit lacking. I would have loved a high-level detail of the proof steps and proof intuition, with pointers to precise sub-proposition statements and corresponding proofs. Right now, it is easy to get lost in the details, and what appears to me as the key moments of the proof are skimmed over quickly. For instance, lemma 3.1 deserved to be expanded upon (even the long version is a bit quick on details here) - this is especially since the GW proof technique is so elegant, it's always nice to include (even if similar to the original proof).
Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods
The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family has been very popular in machine learning due to its "Occam's razor" interpretation. Unfortunately, calculating the potentials in the maximumentropy distribution is intractable [BGS14]. We provide computationally efficient versions of this principle when the mean parameters are pairwise moments: we design distributions that approximately match given pairwise moments, while having entropy which is comparable to the maximum entropy distribution matching those moments. We additionally provide surprising applications of the approximate maximum entropy principle to designing provable variational methods for partition function calculations for Ising models without any assumptions on the potentials of the model. More precisely, we show that we can get approximation guarantees for the log-partition function comparable to those in the low-temperature limit, which is the setting of optimization of quadratic forms over the hypercube.
Reviews: Connectionist Temporal Classification with Maximum Entropy Regularization
This work presents a method for end-to-end sequence learning, and more specifically in the framework of Connectionist Temporal Classification (CTC). The paper has two main contributions: - The first is a regularization of the training of the CTC objective in order to reduce the over-confidence of the model. In order to do that, the authors propose a method based on conditional entropy. More specifically, the proposed regularization would encourages the model to explore paths that are close to the dominant one. In order to do so, they suppose that the consecutive elements of a sequence have equal spacing.
Connectionist Temporal Classification with Maximum Entropy Regularization
Hu Liu, Sheng Jin, Changshui Zhang
Connectionist Temporal Classification (CTC) is an objective function for end-toend sequence learning, which adopts dynamic programming algorithms to directly learn the mapping between sequences. CTC has shown promising results in many sequence learning applications including speech recognition and scene text recognition. However, CTC tends to produce highly peaky and overconfident distributions, which is a symptom of overfitting. To remedy this, we propose a regularization method based on maximum conditional entropy which penalizes peaky distributions and encourages exploration. We also introduce an entropybased pruning method to dramatically reduce the number of CTC feasible paths by ruling out unreasonable alignments. Experiments on scene text recognition show that our proposed methods consistently improve over the CTC baseline without the need to adjust training settings.
Maximum-Entropy Fine Grained Classification
Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we revisit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve stateof-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems.
Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods
The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family has been very popular in machine learning due to its "Occam's razor" interpretation. Unfortunately, calculating the potentials in the maximumentropy distribution is intractable [BGS14]. We provide computationally efficient versions of this principle when the mean parameters are pairwise moments: we design distributions that approximately match given pairwise moments, while having entropy which is comparable to the maximum entropy distribution matching those moments. We additionally provide surprising applications of the approximate maximum entropy principle to designing provable variational methods for partition function calculations for Ising models without any assumptions on the potentials of the model. More precisely, we show that we can get approximation guarantees for the log-partition function comparable to those in the low-temperature limit, which is the setting of optimization of quadratic forms over the hypercube.
Quantum Maximum Entropy Inference and Hamiltonian Learning
Gao, Minbo, Ji, Zhengfeng, Wei, Fuchao
Maximum entropy inference is a widely used method in machine learning, particularly in the context of graphical models (McCallum et al., 2000; Kindermann & Snell, 1980; Ackley et al., 1985; Bresler, 2015; Hamilton et al., 2017) and natural language processing (Berger et al., 1996). In graphical models, it is known as the backward mapping, the problem of computing the model parameters from the marginal information (Wainwright & Jordan, 2007). The inverse problem of estimating marginal parameters from the model parameters is called the forward mapping. Maximum entropy inference is also a core concept in statistical physics (Jaynes, 1957) known as the Jaynes' principle which links statistical mechanics and information theory. The Hammersley-Clifford theorem establishes that, in the classical case, any positive probability distribution satisfying the local Markov property can be represented as a Gibbs distribution (Lafferty et al., 2001).
Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models
Yoon, Sangwoong, Hwang, Himchan, Kwon, Dohyun, Noh, Yung-Kyun, Park, Frank C.
We present a maximum entropy inverse reinforcement learning (IRL) approach for improving the sample quality of diffusion generative models, especially when the number of generation time steps is small. Similar to how IRL trains a policy based on the reward function learned from expert demonstrations, we train (or fine-tune) a diffusion model using the log probability density estimated from training data. Since we employ an energy-based model (EBM) to represent the log density, our approach boils down to the joint training of a diffusion model and an EBM. Our IRL formulation, named Diffusion by Maximum Entropy IRL (DxMI), is a minimax problem that reaches equilibrium when both models converge to the data distribution. The entropy maximization plays a key role in DxMI, facilitating the exploration of the diffusion model and ensuring the convergence of the EBM. We also propose Diffusion by Dynamic Programming (DxDP), a novel reinforcement learning algorithm for diffusion models, as a subroutine in DxMI. DxDP makes the diffusion model update in DxMI efficient by transforming the original problem into an optimal control formulation where value functions replace back-propagation in time. Our empirical studies show that diffusion models fine-tuned using DxMI can generate high-quality samples in as few as 4 and 10 steps. Additionally, DxMI enables the training of an EBM without MCMC, stabilizing EBM training dynamics and enhancing anomaly detection performance.
Statistics-Informed Parameterized Quantum Circuit via Maximum Entropy Principle for Data Science and Finance
Zhuang, Xi-Ning, Chen, Zhao-Yun, Xue, Cheng, Xu, Xiao-Fan, Wang, Chao, Liu, Huan-Yu, Sun, Tai-Ping, Wang, Yun-Jie, Wu, Yu-Chun, Guo, Guo-Ping
Quantum machine learning has demonstrated significant potential in solving practical problems, particularly in statistics-focused areas such as data science and finance. However, challenges remain in preparing and learning statistical models on a quantum processor due to issues with trainability and interpretability. In this letter, we utilize the maximum entropy principle to design a statistics-informed parameterized quantum circuit (SI-PQC) for efficiently preparing and training of quantum computational statistical models, including arbitrary distributions and their weighted mixtures. The SI-PQC features a static structure with trainable parameters, enabling in-depth optimized circuit compilation, exponential reductions in resource and time consumption, and improved trainability and interpretability for learning quantum states and classical model parameters simultaneously. As an efficient subroutine for preparing and learning in various quantum algorithms, the SI-PQC addresses the input bottleneck and facilitates the injection of prior knowledge.
Evaluating MEDIRL: A Replication and Ablation Study of Maximum Entropy Deep Inverse Reinforcement Learning for Human Social Navigation
In this study, we enhance the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) framework, targeting its application in human robot interaction (HRI) for modeling pedestrian behavior in crowded environments. Our work is grounded in the pioneering research by Fahad, Chen, and Guo, and aims to elevate MEDIRL's efficacy in real world HRI settings. We replicated the original MEDIRL model and conducted detailed ablation studies, focusing on key model components like learning rates, state dimensions, and network layers. Our findings reveal the effectiveness of a two dimensional state representation over three dimensional approach, significantly improving model accuracy for pedestrian behavior prediction in HRI scenarios. These results not only demonstrate MEDIRL's enhanced performance but also offer valuable insights for future HRI system development, emphasizing the importance of model customization to specific environmental contexts. Our research contributes to advancing the field of socially intelligent navigation systems, promoting more intuitive and safer human robot interactions.