Goto

Collaborating Authors

 rng


Covariance-aware sampling for Diffusion Models

arXiv.org Machine Learning

We present a covariance-aware sampler that improves the quality of pixel-space Diffusion Model (DM) sampling in the few-step regime. We hypothesize that in the few-step regime samplers fail because they rely solely on the predicted mean of the reverse distribution, while our solution explicitly models the reverse-process covariance. Our method combines Tweedie's formula to estimate the covariance with an efficient, structured Fourier-space decomposition of the covariance matrix. Implemented as an extension of DDIM, our method requires only a minimal overhead: one extra Jacobian-Vector Product (JVP) per step. We demonstrate that for pixel-based DMs, our method consistently produces superior samples compared to state-of-the-art second order samplers (Heun, DPM-Solver++) and the recent aDDIM sampler, at an identical number of function evaluations (NFE).



1f14ac136d55c34a18a04ce3db083599-Paper-Conference.pdf

Neural Information Processing Systems

Augmenting tactic-based interactive theorem provers with neural guidance has been the focus of increased attention in recent years [1, 2, 3, 4, 5]. The dominant approach uses imitation learning on corpora of formalized mathematics. However, despite recent efforts involving self-supervised pre-training [5] or data-augmentation [6], this approach is limited by the conspicuous scarcity of human-producedtrainingdata.


1663fba7b56da1e96bed6e30546a07b0-Supplemental-Conference.pdf

Neural Information Processing Systems

Thus,theassumption of the policy being conditionally-independent ofzฯ‰ givenziฮฑ corresponds well to the assumption of agents only using local information (rather than joint information) in MARL to inform their policy/decision-making. Note that we found that cyclically-annealing [82]theฮฒ term in our variational lower bound from0to the values specified in Table 5to help avoid KL-vanishing. A.2.4 ComputationalDetails For MARL trajectory data generation, we used an internal CPU cluster for both the 3-agent hillclimbing and 2-agent coordination domains, using TPUs for only the multiagent MuJoCo data generation. Given a characteristic of interest (e.g., the level of dispersion of agents), we define a training set consisting of joint latentszฯ‰ and class labelsy (e.g., classes corresponding to different intervals of team returns). Using these definitions, we can gauge the representational power ofzฯ‰ by learning a mapping g: ห†ฮฝc(zฯ‰) y. In practice, g is a simple model (e.g., shallow network or linear projection) so as to gauge the expressivity of the latent space.


An efficient probabilistic hardware architecture for diffusion-like models

arXiv.org Artificial Intelligence

The proliferation of probabilistic AI has prompted proposals for specialized stochastic computers. Despite promising efficiency gains, these proposals have failed to gain traction because they rely on fundamentally limited modeling techniques and exotic, unscalable hardware. In this work, we address these shortcomings by proposing an all-transistor probabilistic computer that implements powerful denoising models at the hardware level. A system-level analysis indicates that devices based on our architecture could achieve performance parity with GPUs on a simple image benchmark using approximately 10,000 times less energy.


Securing generative artificial intelligence with parallel magnetic tunnel junction true randomness

arXiv.org Artificial Intelligence

Deterministic pseudo random number generators (PRNGs) used in generative artificial intelligence (GAI) models produce predictable patterns vulnerable to exploitation by attackers. Conventional defences against the vulnerabilities often come with significant energy and latency overhead. Here, we embed hardware-generated true random bits from spin-transfer torque magnetic tunnel junctions (STT-MTJs) to address the challenges. A highly parallel, FPGA-assisted prototype computing system delivers megabit-per-second true random numbers, passing NIST randomness tests after in-situ operations with minimal overhead. Integrating the hardware random bits into a generative adversarial network (GAN) trained on CIFAR-10 reduces insecure outputs by up to 18.6 times compared to the low-quality random number generators (RNG) baseline. With nanosecond switching speed, high energy efficiency, and established scalability, our STT-MTJ-based system holds the potential to scale beyond 106 parallel cells, achieving gigabit-per-second throughput suitable for large language model sampling. This advancement highlights spintronic RNGs as practical security components for next-generation GAI systems.


Temporal Anchoring in Deepening Embedding Spaces: Event-Indexed Projections, Drift, Convergence, and an Internal Computational Architecture

arXiv.org Machine Learning

We develop an operator-theoretic framework for temporal anchoring in embedding spaces, modeled as drift maps interleaved with event-indexed blocks culminating in affine projections. We provide complete proofs for a variable-block contraction lemma (products of Lipschitz factors), a drift--projection convergence theorem with explicit uniform-gap envelopes, and ontological convergence under nested affine anchors with a robustness variant. We formalize an internal Manuscript Computer (MC) whose computations are defined purely by these operators and prove a rigorous finite-run equivalence theorem (with perturbation bounds). For attention layers, we give a self-contained proof that softmax is $1/2$-Lipschitz in $\ell_2$ and derive sufficient layer-contraction conditions (orthogonal/non-orthogonal heads). All floats are placed exactly where written; the manuscript uses only in-paper pseudocode and appendix figures.


A convergence law for continuous logic and continuous structures with finite domains

arXiv.org Artificial Intelligence

We consider continuous relational structures with finite domain $[n] := \{1, \ldots, n\}$ and a many valued logic, $CLA$, with values in the unit interval and which uses continuous connectives and continuous aggregation functions. $CLA$ subsumes first-order logic on ``conventional'' finite structures. To each relation symbol $R$ and identity constraint $ic$ on a tuple the length of which matches the arity of $R$ we associate a continuous probability density function $ฮผ_R^{ic} : [0, 1] \to [0, \infty)$. We also consider a probability distribution on the set $\mathbf{W}_n$ of continuous structures with domain $[n]$ which is such that for every relation symbol $R$, identity constraint $ic$, and tuple $\bar{a}$ satisfying $ic$, the distribution of the value of $R(\bar{a})$ is given by $ฮผ_R^{ic}$, independently of the values for other relation symbols or other tuples. In this setting we prove that every formula in $CLA$ is asymptotically equivalent to a formula without any aggregation function. This is used to prove a convergence law for $CLA$ which reads as follows for formulas without free variables: If $ฯ†\in CLA$ has no free variable and $I \subseteq [0, 1]$ is an interval, then there is $ฮฑ\in [0, 1]$ such that, as $n$ tends to infinity, the probability that the value of $ฯ†$ is in $I$ tends to $ฮฑ$.


Conditioning through indifference in quantum mechanics

arXiv.org Artificial Intelligence

We can learn (more) about the state a quantum system is in through measurements. We look at how to describe the uncertainty about a quantum system's state conditional on executing such measurements. We show that by exploiting the interplay between desirability, coherence and indifference, a general rule for conditioning can be derived. We then apply this rule to conditioning on measurement outcomes, and show how it generalises to conditioning on a set of measurement outcomes.


Model Integrity when Unlearning with T2I Diffusion Models

arXiv.org Artificial Intelligence

The rapid advancement of text-to-image Diffusion Models has led to their widespread public accessibility. However these models, trained on large internet datasets, can sometimes generate undesirable outputs. To mitigate this, approximate Machine Unlearning algorithms have been proposed to modify model weights to reduce the generation of specific types of images, characterized by samples from a ``forget distribution'', while preserving the model's ability to generate other images, characterized by samples from a ``retain distribution''. While these methods aim to minimize the influence of training data in the forget distribution without extensive additional computation, we point out that they can compromise the model's integrity by inadvertently affecting generation for images in the retain distribution. Recognizing the limitations of FID and CLIPScore in capturing these effects, we introduce a novel retention metric that directly assesses the perceptual difference between outputs generated by the original and the unlearned models. We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines. Given their straightforward implementation, these algorithms serve as valuable benchmarks for future advancements in approximate Machine Unlearning for Diffusion Models.