symmetrization
A Appendix A.1 Proofs A.1.1 Proof of Theorem 1 (Section 2.1) Theorem 1. If p
Let ψ: X Y be an arbitrary G equivariant function. We leave proving this as a future work. We now show the following: Proposition 3. The proposed distribution p We now show the following: Proposition 6. From Eq. (29), we have: ϕ Proposition 7. The proposed symmetrization From Eq. (29), we have: ϕ This is after handling the translation component of the Euclidean group E ( d) / SE (d) as in Eq. (29). We now show the following: Proposition 8. Therefore, probabilistic symmetrization can become frame averaging.
Two tales for a geometric Jensen--Shannon divergence
The geometric Jensen--Shannon divergence (G-JSD) gained popularity in machine learning and information sciences thanks to its closed-form expression between Gaussian distributions. In this work, we introduce an alternative definition of the geometric Jensen--Shannon divergence tailored to positive densities which does not normalize geometric mixtures. This novel divergence is termed the extended G-JSD as it applies to the more general case of positive measures. We report explicitly the gap between the extended G-JSD and the G-JSD when considering probability densities, and show how to express the G-JSD and extended G-JSD using the Jeffreys divergence and the Bhattacharyya distance or Bhattacharyya coefficient. The extended G-JSD is proven to be a $f$-divergence which is a separable divergence satisfying information monotonicity and invariance in information geometry. We derive corresponding closed-form formula for the two types of G-JSDs when considering the case of multivariate Gaussian distributions often met in applications. We consider Monte Carlo stochastic estimations and approximations of the two types of G-JSD using the projective $γ$-divergences. Although the square root of the JSD yields a metric distance, we show that this is not anymore the case for the two types of G-JSD. Finally, we explain how these two types of geometric JSDs can be interpreted as regularizations of the ordinary JSD.
- Asia > Japan (0.40)
- Asia > Middle East > Jordan (0.04)
Effective Field Neural Network
Liu, Xi, Zhao, Yujun, Wan, Chun Yu, Zhang, Yang, Liu, Junwei
Effective Field Neural Network Xi Liu, 1 Yujun Zhao, 1 Chun Yu Wan, 1 Yang Zhang, 2, 3 and Junwei Liu 1, 1 Department of Physics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China 2 Department of Physics and Astronomy, University of Tennessee, Knoxville, TN 37996, USA 3 Min H. Kao Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee 37996, USA (Dated: February 26, 2025) In recent years, with the rapid development of machine learning, physicists have been exploring its new applications in solving or alleviating the curse of dimensionality in many-body problems. In order to accurately reflect the underlying physics of the problem, domain knowledge must be encoded into the machine learning algorithms. In this work, inspired by field theory, we propose a new set of machine learning models called effective field neural networks (EFNNs) that can automatically and efficiently capture important many-body interactions through multiple self-refining processes. Taking the classical 3-spin infinite-range model and the quantum double exchange model as case studies, we explicitly demonstrate that EFNNs significantly outperform fully-connected deep neural networks (DNNs) and the effective model. Furthermore, with the help of convolution operations, the EFNNs learned in a small system can be seamlessly used in a larger system without additional training and the relative errors even decrease, which further demonstrates the efficacy of EFNNs in representing core physical behaviors.
- North America > United States > Tennessee > Knox County > Knoxville (1.00)
- Asia > China > Hong Kong (0.45)
- North America > United States > New York > New York County > New York City (0.14)
- Europe (0.14)
Diagonal Symmetrization of Neural Network Solvers for the Many-Electron Schr\"odinger Equation
Huang, Kevin Han, Zhan, Ni, Ertekin, Elif, Orbanz, Peter, Adams, Ryan P.
Incorporating group symmetries into neural networks has been a cornerstone of success in many AI-for-science applications. Diagonal groups of isometries, which describe the invariance under a simultaneous movement of multiple objects, arise naturally in many-body quantum problems. Despite their importance, diagonal groups have received relatively little attention, as they lack a natural choice of invariant maps except in special cases. We study different ways of incorporating diagonal invariance in neural network ans\"atze trained via variational Monte Carlo methods, and consider specifically data augmentation, group averaging and canonicalization. We show that, contrary to standard ML setups, in-training symmetrization destabilizes training and can lead to worse performance. Our theoretical and numerical results indicate that this unexpected behavior may arise from a unique computational-statistical tradeoff not found in standard ML analyses of symmetrization. Meanwhile, we demonstrate that post hoc averaging is less sensitive to such tradeoffs and emerges as a simple, flexible and effective method for improving neural network solvers.
- North America > United States > Illinois (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Wang, Jinguang, Yin, Yuexi, Sun, Haifeng, Qi, Qi, Wang, Jingyu, Zhuang, Zirui, Yang, Tingting, Liao, Jianxin
Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ) method for the activations of LLMs. OutlierTune consists of two components: pre-execution of dequantization and symmetrization. The pre-execution of dequantization updates the model weights by the activation scaling factors, avoiding the internal scaling and costly additional computational overheads brought by the per-channel activation quantization. The symmetrization further reduces the quantization differences arising from the weight updates by ensuring the balanced numerical ranges across different activation channels. OutlierTune is easy to implement and hardware-efficient, introducing almost no additional computational overheads during the inference. Extensive experiments show that the proposed framework outperforms existing methods across multiple different tasks. Demonstrating better generalization, this framework improves the Int6 quantization of the instruction-tuning LLMs, such as OPT-IML, to the same level as half-precision (FP16). Moreover, we have shown that the proposed framework is 1.48x faster than the FP16 implementation while reducing approximately 2x memory usage.
- North America > United States > New Jersey (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > Singapore (0.04)
- (2 more...)
Baking Symmetry into GFlowNets
Ma, George, Bengio, Emmanuel, Bengio, Yoshua, Zhang, Dinghuai
GFlowNets have exhibited promising performance in generating diverse candidates with high rewards. These networks generate objects incrementally and aim to learn a policy that assigns probability of sampling objects in proportion to rewards. However, the current training pipelines of GFlowNets do not consider the presence of isomorphic actions, which are actions resulting in symmetric or isomorphic states. This lack of symmetry increases the amount of samples required for training GFlowNets and can result in inefficient and potentially incorrect flow functions. As a consequence, the reward and diversity of the generated objects decrease. In this study, our objective is to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process. Experimental results using synthetic data demonstrate the promising performance of our proposed approaches.
- North America > Canada > Quebec > Montreal (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)