Goto

Collaborating Authors

 hutchinson


What Trace Powers Reveal About Log-Determinants: Closed-Form Estimators, Certificates, and Failure Modes

Sao, Piyush

arXiv.org Machine Learning

Computing $\log\det(A)$ for large symmetric positive definite matrices arises in Gaussian process inference and Bayesian model comparison. Standard methods combine matrix-vector products with polynomial approximations. We study a different model: access to trace powers $p_k = \tr(A^k)$, natural when matrix powers are available. Classical moment-based approximations Taylor-expand $\log(λ)$ around the arithmetic mean. This requires $|λ- \AM| < \AM$ and diverges when $κ> 4$. We work instead with the moment-generating function $M(t) = \E[X^t]$ for normalized eigenvalues $X = λ/\AM$. Since $M'(0) = \E[\log X]$, the log-determinant becomes $\log\det(A) = n(\log \AM + M'(0))$ -- the problem reduces to estimating a derivative at $t = 0$. Trace powers give $M(k)$ at positive integers, but interpolating $M(t)$ directly is ill-conditioned due to exponential growth. The transform $K(t) = \log M(t)$ compresses this range. Normalization by $\AM$ ensures $K(0) = K(1) = 0$. With these anchors fixed, we interpolate $K$ through $m+1$ consecutive integers and differentiate to estimate $K'(0)$. However, this local interpolation cannot capture arbitrary spectral features. We prove a fundamental limit: no continuous estimator using finitely many positive moments can be uniformly accurate over unbounded conditioning. Positive moments downweight the spectral tail; $K'(0) = \E[\log X]$ is tail-sensitive. This motivates guaranteed bounds. From the same traces we derive upper bounds on $(\det A)^{1/n}$. Given a spectral floor $r \leq λ_{\min}$, we obtain moment-constrained lower bounds, yielding a provable interval for $\log\det(A)$. A gap diagnostic indicates when to trust the point estimate and when to report bounds. All estimators and bounds cost $O(m)$, independent of $n$. For $m \in \{4, \ldots, 8\}$, this is effectively constant time.


Optimal Sketching for Trace Estimation

Neural Information Processing Systems

Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on Hutchinson's method, which requires $O(\log(1/\delta)/\epsilon^2)$ matrix-vector product queries to achieve a $(1 \pm \epsilon)$-multiplicative approximation to $\text{trace}(A)$ with failure probability $\delta$ on positive-semidefinite input matrices $A$. Recently, the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries from $O(1/\epsilon^2)$ to the optimal $O(1/\epsilon)$, and the algorithm succeeds with constant probability. However, in the high probability setting, the non-adaptive Hutch++ algorithm suffers an extra $O(\sqrt{\log(1/\delta)})$ multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve $O(\sqrt{\log(1/\delta)}/\epsilon + \log(1/\delta))$ matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a $\log \log(1/\delta)$ factor, no further improvement in the dependence on $\delta$ or $\epsilon$ is possible by any non-adaptive algorithm. Finally, our experiments demonstrate the superior performance of our sketch over the adaptive Hutch++ algorithm, which is less parallelizable, as well as over the non-adaptive Hutchinson's method.




A Algorithms

Neural Information Processing Systems

Below we include detailed pseudocode for algorithms described in the main text.Algorithm 2 Parameter Free DeltaShift Input: Implicit matrix-vector multiplication access to A In this section, we give a full proof of Theorem 1.1 with the correct logarithmic dependence on Before doing so, we collect several definitions and results required for proving the theorem. As discussed, a tight analysis of Hutchinson's estimator, and also our DeltaShift algorithm, relies Finally, from Claim B.2, we immediately have Rademacher random vectors, a similar analysis can be performed for any i.i.d. Now, we are ready to move on to the main result. The proof is by induction. We claim that, for all j = 1,...,m, t Next consider the inductive case.




Appendix for Riemannian Continuous Normalizing Flows A Constant curvature manifolds

Neural Information Processing Systems

In the following, we provide a brief overview of Riemannian geometry and constant curvature manifolds, specifically the Poincaré ball and the hypersphere models. In the following, we provide a brief overview of key concepts related to hyperbolic geometry. In the following, we discuss key concepts related to positively curved spaces known as elliptic spaces, and in particular to the hypersphere model. It is endowed with the pull-back metric of the ambient Euclidean space. Unfortunately, conventional probabilistic models implicitly assume a flat geometry.


Navy 'wolf pack' drone boats in warship trial success

BBC News

A flotilla of uncrewed wolf pack drone boats has successfully been used to escort warships in a Royal Navy and Army trial. The Navy said it was a milestone demonstration of how it could utilise such technology in a real-life scenario. With camera and sensor data being fed back to Patrick Blackett, five 7.2m autonomous Rattler boats safely escorted the two ships playing the role of foreign warships during the 72-hour milestone training exercise, it said. The demonstration was a culmination of months of trials by the Navy's Disruptive Capabilities and Technology Office (DCTO) and the Fleet Experimentation Squadron (FXS). Each of the Rattler boats were operated by a two-person team, with one responsible for piloting the drone and the other monitoring and operating onboard systems, as well as helping to manage live data streams.