efficient implementation
On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood
We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error $\epsilon \gg n^{-1/3}$. This result improves upon the previous best accuracy threshold of $\epsilon \gg n^{-1/4}$ achievable by polynomial time computable PML-based universal estimators \cite{ACSS20, ACSS20b}. Our estimator reaches a theoretical limit for universal symmetric property estimation as \cite{Han20} shows that a broad class of universal estimators (containing many well known approaches including ours) cannot be sample optimal for every $1$-Lipschitz property when $\epsilon \ll n^{-1/3}$.
clarity recommendations the reviewers suggest, turning now to the main concerns of each reviewer
We thank the reviewers for their valuable feedback, which will improve the paper. Regarding the reviewer's comments about applications, we chose to limit the number of applications to three because Cauchy, which has unbounded variance), in contrast to our mechanisms. As requested, we will add a discussion about related work on lower bounds for private mechanisms. For the reviewer's main comment on the contributions of this paper with regard to Asi & Duchi 2020, we believe Such general (vector-valued) functions are the main focus of this submission. We thank the reviewer for bringing our attention to the Reimherr & A wan's K-norm mechanism (2019), which certainly We will discuss this work more carefully in the final version.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper presents new techniques, and theoretical results relating to those techniques, to optimise the Variational Gaussian lower bound to the log evidence in latent Gaussian models LGMs. The technique could be applied to a broader category of latent linear models but their presentation focuses on LGMs only. VG approximate inference is important because it has many nice properties: it is widely applicable, often quite accurate and relatively fast. This paper attempts to makes VG methods more scalable without resorting to making factorisation assumptions on the approximating Gaussian distribution. Whilst the authors show how the objective function can be `decoupled' they do not show experimentally that this leads to a clear improvement in speed or scalability over standard techniques.
Efficient Online Large-Margin Classification via Dual Certificates
Ho-Nguyen, Nam, Kฤฑlฤฑnรง-Karzan, Fatma, Nguyen, Ellie, Shen, Lingqing
Online classification is a central problem in optimization, statistical learning and data science. Classical algorithms such as the perceptron offer efficient updates and finite mistake guarantees on linearly separable data, but they do not exploit the underlying geometric structure of the classification problem. We study the offline maximum margin problem through its dual formulation and use the resulting geometric insights to design a principled and efficient algorithm for the online setting. A key feature of our method is its translation invariance, inherited from the offline formulation, which plays a central role in its performance analysis. Our theoretical analysis yields improved mistake and margin bounds that depend only on translation-invariant quantities, offering stronger guarantees than existing algorithms under the same assumptions in favorable settings. In particular, we identify a parameter regime where our algorithm makes at most two mistakes per sequence, whereas the perceptron can be forced to make arbitrarily many mistakes. Our numerical study on real data further demonstrates that our method matches the computational efficiency of existing online algorithms, while significantly outperforming them in accuracy.
Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions
Goswami, Rohit, Masterov, Maxim, Kamath, Satish, Peรฑa-Torres, Alejandro, Jรณnsson, Hannes
The task of locating first order saddle points on high-dimensional surfaces describing the variation of energy as a function of atomic coordinates is an essential step for identifying the mechanism and estimating the rate of thermally activated events within the harmonic approximation of transition state theory. When combined directly with electronic structure calculations, the number of energy and atomic force evaluations needed for convergence is a primary issue. Here, we describe an efficient implementation of Gaussian process regression (GPR) acceleration of the minimum mode following method where a dimer is used to estimate the lowest eigenmode of the Hessian. A surrogate energy surface is constructed and updated after each electronic structure calculation. The method is applied to a test set of 500 molecular reactions previously generated by Hermez and coworkers [J. Chem. Theory Comput. 18, 6974 (2022)]. An order of magnitude reduction in the number of electronic structure calculations needed to reach the saddle point configurations is obtained by using the GPR compared to the dimer method. Despite the wide range in stiffness of the molecular degrees of freedom, the calculations are carried out using Cartesian coordinates and are found to require similar number of electronic structure calculations as an elaborate internal coordinate method implemented in the Sella software package. The present implementation of the GPR surrogate model in C++ is efficient enough for the wall time of the saddle point searches to be reduced in 3 out of 4 cases even though the calculations are carried out at a low Hartree-Fock level.
SageAttention2++: A More Efficient Implementation of SageAttention2
Zhang, Jintao, Xu, Xiaoming, Wei, Jia, Huang, Haofeng, Zhang, Pengle, Xiang, Chendong, Zhu, Jun, Chen, Jianfei
The efficiency of attention is critical because its time complexity grows quadratically with sequence length. SageAttention2 addresses this by utilizing quantization to accelerate matrix multiplications (Matmul) in attention. To further accelerate SageAttention2, we propose to utilize the faster instruction of FP8 Matmul accumulated in FP16. The instruction is 2x faster than the FP8 Matmul used in SageAttention2. Our experiments show that SageAttention2++ achieves a 3.9x speedup over FlashAttention while maintaining the same attention accuracy as SageAttention2. This means SageAttention2++ effectively accelerates various models, including those for language, image, and video generation, with negligible end-to-end metrics loss. The code will be available at https://github.com/thu-ml/SageAttention.