Goto

Collaborating Authors

 rng


An efficient probabilistic hardware architecture for diffusion-like models

Jelinčič, Andraž, Lockwood, Owen, Garlapati, Akhil, Schillinger, Peter, Chuang, Isaac, Verdon, Guillaume, McCourt, Trevor

arXiv.org Artificial Intelligence

The proliferation of probabilistic AI has prompted proposals for specialized stochastic computers. Despite promising efficiency gains, these proposals have failed to gain traction because they rely on fundamentally limited modeling techniques and exotic, unscalable hardware. In this work, we address these shortcomings by proposing an all-transistor probabilistic computer that implements powerful denoising models at the hardware level. A system-level analysis indicates that devices based on our architecture could achieve performance parity with GPUs on a simple image benchmark using approximately 10,000 times less energy.



Securing generative artificial intelligence with parallel magnetic tunnel junction true randomness

Bao, Youwei, Yang, Shuhan, Yang, Hyunsoo

arXiv.org Artificial Intelligence

Deterministic pseudo random number generators (PRNGs) used in generative artificial intelligence (GAI) models produce predictable patterns vulnerable to exploitation by attackers. Conventional defences against the vulnerabilities often come with significant energy and latency overhead. Here, we embed hardware-generated true random bits from spin-transfer torque magnetic tunnel junctions (STT-MTJs) to address the challenges. A highly parallel, FPGA-assisted prototype computing system delivers megabit-per-second true random numbers, passing NIST randomness tests after in-situ operations with minimal overhead. Integrating the hardware random bits into a generative adversarial network (GAN) trained on CIFAR-10 reduces insecure outputs by up to 18.6 times compared to the low-quality random number generators (RNG) baseline. With nanosecond switching speed, high energy efficiency, and established scalability, our STT-MTJ-based system holds the potential to scale beyond 106 parallel cells, achieving gigabit-per-second throughput suitable for large language model sampling. This advancement highlights spintronic RNGs as practical security components for next-generation GAI systems.



Temporal Anchoring in Deepening Embedding Spaces: Event-Indexed Projections, Drift, Convergence, and an Internal Computational Architecture

Alpay, Faruk, Kilictas, Bugra, Alakkad, Hamdi

arXiv.org Machine Learning

We develop an operator-theoretic framework for temporal anchoring in embedding spaces, modeled as drift maps interleaved with event-indexed blocks culminating in affine projections. We provide complete proofs for a variable-block contraction lemma (products of Lipschitz factors), a drift--projection convergence theorem with explicit uniform-gap envelopes, and ontological convergence under nested affine anchors with a robustness variant. We formalize an internal Manuscript Computer (MC) whose computations are defined purely by these operators and prove a rigorous finite-run equivalence theorem (with perturbation bounds). For attention layers, we give a self-contained proof that softmax is $1/2$-Lipschitz in $\ell_2$ and derive sufficient layer-contraction conditions (orthogonal/non-orthogonal heads). All floats are placed exactly where written; the manuscript uses only in-paper pseudocode and appendix figures.


A convergence law for continuous logic and continuous structures with finite domains

Koponen, Vera

arXiv.org Artificial Intelligence

We consider continuous relational structures with finite domain $[n] := \{1, \ldots, n\}$ and a many valued logic, $CLA$, with values in the unit interval and which uses continuous connectives and continuous aggregation functions. $CLA$ subsumes first-order logic on ``conventional'' finite structures. To each relation symbol $R$ and identity constraint $ic$ on a tuple the length of which matches the arity of $R$ we associate a continuous probability density function $μ_R^{ic} : [0, 1] \to [0, \infty)$. We also consider a probability distribution on the set $\mathbf{W}_n$ of continuous structures with domain $[n]$ which is such that for every relation symbol $R$, identity constraint $ic$, and tuple $\bar{a}$ satisfying $ic$, the distribution of the value of $R(\bar{a})$ is given by $μ_R^{ic}$, independently of the values for other relation symbols or other tuples. In this setting we prove that every formula in $CLA$ is asymptotically equivalent to a formula without any aggregation function. This is used to prove a convergence law for $CLA$ which reads as follows for formulas without free variables: If $φ\in CLA$ has no free variable and $I \subseteq [0, 1]$ is an interval, then there is $α\in [0, 1]$ such that, as $n$ tends to infinity, the probability that the value of $φ$ is in $I$ tends to $α$.


Conditioning through indifference in quantum mechanics

De Vos, Keano, de Cooman, Gert

arXiv.org Artificial Intelligence

We can learn (more) about the state a quantum system is in through measurements. We look at how to describe the uncertainty about a quantum system's state conditional on executing such measurements. We show that by exploiting the interplay between desirability, coherence and indifference, a general rule for conditioning can be derived. We then apply this rule to conditioning on measurement outcomes, and show how it generalises to conditioning on a set of measurement outcomes.


Model Integrity when Unlearning with T2I Diffusion Models

Schioppa, Andrea, Hoogeboom, Emiel, Heek, Jonathan

arXiv.org Artificial Intelligence

The rapid advancement of text-to-image Diffusion Models has led to their widespread public accessibility. However these models, trained on large internet datasets, can sometimes generate undesirable outputs. To mitigate this, approximate Machine Unlearning algorithms have been proposed to modify model weights to reduce the generation of specific types of images, characterized by samples from a ``forget distribution'', while preserving the model's ability to generate other images, characterized by samples from a ``retain distribution''. While these methods aim to minimize the influence of training data in the forget distribution without extensive additional computation, we point out that they can compromise the model's integrity by inadvertently affecting generation for images in the retain distribution. Recognizing the limitations of FID and CLIPScore in capturing these effects, we introduce a novel retention metric that directly assesses the perceptual difference between outputs generated by the original and the unlearned models. We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines. Given their straightforward implementation, these algorithms serve as valuable benchmarks for future advancements in approximate Machine Unlearning for Diffusion Models.


Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Ma, Haiyue, Liu, Jian, Krashinsky, Ronny

arXiv.org Artificial Intelligence

Dropout, a network operator, when enabled is likely to dramatically impact the performance of Flash-Attention, which in turn increases the end-to-end training time of Large-Language-Models (LLMs). The main contributor to such performance degradation is the Random Number Generation (RNG) phase that is traditionally fused into the Flash-Attention kernel. As RNG and Attention have the same hardware bottlenecks, RNG latency can hardly be hidden within the Attention kernel. We propose overlapping RNG with previous GEMM layers in the network to hide RNG runtime and improve end-to-end performance. RNG and GEMM have distinct resource requirements and hardware bottlenecks, so they can run in parallel without compromising each other's performance. Our fine-grained performance model, cross-validated by silicon results, shows 1.14x speedup on one transformer block (including multi-head attention and feed-forward layers) for Llama2, and up to 1.23x speedup when varying workload sizes, on GH100 GPUs with FP8 precision. Further, we extend our theoretical model to different RNG implementations and hardware architectures, and discuss the widely applicable benefits for overlapping RNG with GEMM layers.


RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis

Liu, Yaxin, Zhou, Yan, Li, Ziming, Zhang, Jinchuan, Shang, Yu, Zhang, Chenyang, Hu, Songlin

arXiv.org Artificial Intelligence

As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap, i.e., coarse- and fine-grained gap. Both issues may interfere with accurate identification of aspect-sentiment pairs. To address these limitations, we propose a novel framework named RNG for JMASA. Specifically, to simultaneously reduce multi-level modality noise and multi-grained semantic gap, we design three constraints: (1) Global Relevance Constraint (GR-Con) based on text-image similarity for instance-level noise reduction, (2) Information Bottleneck Constraint (IB-Con) based on the Information Bottleneck (IB) principle for feature-level noise reduction, and (3) Semantic Consistency Constraint (SC-Con) based on mutual information maximization in a contrastive learning way for multi-grained semantic gap reduction. Extensive experiments on two datasets validate our new state-of-the-art performance.