relative error
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Iowa (0.04)
- Asia (0.04)
What Trace Powers Reveal About Log-Determinants: Closed-Form Estimators, Certificates, and Failure Modes
Computing $\log\det(A)$ for large symmetric positive definite matrices arises in Gaussian process inference and Bayesian model comparison. Standard methods combine matrix-vector products with polynomial approximations. We study a different model: access to trace powers $p_k = \tr(A^k)$, natural when matrix powers are available. Classical moment-based approximations Taylor-expand $\log(λ)$ around the arithmetic mean. This requires $|λ- \AM| < \AM$ and diverges when $κ> 4$. We work instead with the moment-generating function $M(t) = \E[X^t]$ for normalized eigenvalues $X = λ/\AM$. Since $M'(0) = \E[\log X]$, the log-determinant becomes $\log\det(A) = n(\log \AM + M'(0))$ -- the problem reduces to estimating a derivative at $t = 0$. Trace powers give $M(k)$ at positive integers, but interpolating $M(t)$ directly is ill-conditioned due to exponential growth. The transform $K(t) = \log M(t)$ compresses this range. Normalization by $\AM$ ensures $K(0) = K(1) = 0$. With these anchors fixed, we interpolate $K$ through $m+1$ consecutive integers and differentiate to estimate $K'(0)$. However, this local interpolation cannot capture arbitrary spectral features. We prove a fundamental limit: no continuous estimator using finitely many positive moments can be uniformly accurate over unbounded conditioning. Positive moments downweight the spectral tail; $K'(0) = \E[\log X]$ is tail-sensitive. This motivates guaranteed bounds. From the same traces we derive upper bounds on $(\det A)^{1/n}$. Given a spectral floor $r \leq λ_{\min}$, we obtain moment-constrained lower bounds, yielding a provable interval for $\log\det(A)$. A gap diagnostic indicates when to trust the point estimate and when to report bounds. All estimators and bounds cost $O(m)$, independent of $n$. For $m \in \{4, \ldots, 8\}$, this is effectively constant time.
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- North America > United States > Rocky Mountains (0.04)
- North America > United States > California (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.45)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment
Kwon, Sangwoo, Seo, Seong Hoon, Lee, Jae W., Park, Yeonhong
How can we effectively handle queries for on-device large language models (LLMs) with varying runtime constraints, such as latency and accuracy? Multi-scale quantization addresses this challenge by enabling memory-efficient runtime model adaptation of LLMs through the overlaying of multiple model variants quantized to different bitwidths. Meanwhile, an important question still remains open-ended: how can models be properly configured to match a target precision or latency? While mixed-precision offers a promising solution, we take this further by leveraging the key observation that the sensitivity of each layer dynamically changes across decoding steps. Building on this insight, we introduce DP-LLM, a novel mechanism that dynamically assigns precision to each layer based on input values. Experimental results across multiple models and benchmarks demonstrate that DP-LLM achieves a superior performance-latency trade-off, outperforming prior approaches.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators
Sawhney, Medha, Neog, Abhilash, Khurana, Mridul, Karpatne, Anuj
Diffusion-based solvers for partial differential equations (PDEs) are often bottle-necked by slow gradient-based test-time optimization routines that use PDE residuals for loss guidance. They additionally suffer from optimization instabilities and are unable to dynamically adapt their inference scheme in the presence of noisy PDE residuals. To address these limitations, we introduce PRISMA (PDE Residual Informed Spectral Modulation with Attention), a conditional diffusion neural operator that embeds PDE residuals directly into the model's architecture via attention mechanisms in the spectral domain, enabling gradient-descent free inference. We show that PRISMA has competitive accuracy, at substantially lower inference costs, compared to previous methods across five benchmark PDEs especially with noisy observations, while using 10x to 100x fewer denoising steps, leading to 15x to 250x faster inference. Given the ubiquitous presence of partial differential equations (PDEs) in almost every scientific discipline, there is a rapidly growing literature on using neural networks for solving PDEs (Raissi et al., 2019a; Lu et al., 2019). This includes seminal works in operator learning methods such as the Fourier Neural Operator (FNO) Li et al. (2020) that learns resolution-independent mappings between function spaces of input parameters a and solution fields u. However, a major limitation of these methods is their reliance on complete and clean observations of either a or u, a condition rarely met in real-world applications where data is inherently noisy and sparse. The rise of generative models has inspired another class of methods for solving PDEs by modeling the joint distribution of a and u using diffusion-based backbones (Huang et al., 2024; Y ao et al., 2025; Lim et al., 2023; Shu et al., 2023; Bastek et al., 2024; Jacobsen et al., 2025). These methods offer two key advantages over operator learning methods: (i) they generate full posterior distributions of a and/or u, enabling principled uncertainty quantification crucial for ill-posed inverse problems, and (ii) they naturally accommodate sparse observations during inference using likelihood-based and PDE residual-based loss guidance, termed diffusion posterior sampling or test-time optimization.
Non-Negative Matrix Factorization Using Non-Von Neumann Computers
Borle, Ajinkya, Nicholas, Charles, Chukwu, Uchenna, Miri, Mohammad-Ali, Chancellor, Nicholas
Non-negative matrix factorization (NMF) is a matrix decomposition problem with applications in unsupervised learning. The general form of this problem (along with many of its variants) is NP-hard in nature. In our work, we explore how this problem could be solved with an energy-based optimization method suitable for certain machines with non-von Neumann architectures. We used the Dirac-3, a device based on the entropy computing paradigm and made by Quantum Computing Inc., to evaluate our approach. Our formulations consist of (i) a quadratic unconstrained binary optimization model (QUBO, suitable for Ising machines) and a quartic formulation that allows for real-valued and integer variables (suitable for machines like the Dirac-3). Although current devices cannot solve large NMF problems, the results of our preliminary experiments are promising enough to warrant further research. For non-negative real matrices, we observed that a fusion approach of first using Dirac-3 and then feeding its results as the initial factor matrices to Scikit-learn's NMF procedure outperforms Scikit-learn's NMF procedure on its own, with default parameters in terms of the error in the reconstructed matrices. For our experiments on non-negative integer matrices, we compared the Dirac-3 device to Google's CP-SAT solver (inside the Or-Tools package) and found that for serial processing, Dirac-3 outperforms CP-SAT in a majority of the cases. We believe that future work in this area might be able to identify domains and variants of the problem where entropy computing (and other non-von Neumann architectures) could offer a clear advantage.
- North America > United States > Maryland > Baltimore County (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Semiconductors & Electronics (0.46)
- Information Technology > Security & Privacy (0.46)
Reliable Selection of Heterogeneous Treatment Effect Estimators
The estimation of heterogeneous treatment effects (HTEs) has become a central topic across statistics, econometrics, and machine learning, with applications ranging from personalized medicine to policy evaluation [1, 2, 3]. A growing body of work has proposed flexible estimators to capture individual-level treatment heterogeneity, including tree-based methods [2], representation-learning approaches [4, 5], and meta-learners [6, 7]. Despite this abundance of methods, determining which estimator performs best for a given application remains an open and underexplored problem [8, 9]. A reliable selection mechanism is crucial for practitioners [10], as choosing suboptimal estimators can directly affect downstream decision-making [11]. Evaluating or comparing HTE estimators is inherently difficult because the ground truth is unobservable: for each individual, only one potential outcome is realized [12], while HTEs are defined as the difference between two. Due to the fundamental unobservability of the treatment effect, comparing two HTE estimators is already challenging, and the difficulty is further exacerbated when a collection of estimators are being compared simultaneously. To our knowledge, most papers that compare multiple HTE estimators rely on ground-truth or simulated values and use them to compute metrics such as the Precision in Estimation of Heterogeneous Effect (PEHE) and the A TE [13, 14]. However, these evaluation metrics are subject to fundamental limitations: ground-truth are unavailable in real-world observational studies, and simulated values depend critically on the chosen data-generating process and offer no formal statistical guarantees. In this paper, we develop a method for accurately selecting the best heterogeneous treatment effect estimator that operates without ground-truth information and provides provable error control.
- North America > United States > California (0.14)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Macao (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)