logq
A Unified Framework for Data-Free One-Step Sampling via Wasserstein Gradient Flows
We develop a unified theoretical framework for data-free one-step sampling from unnormalized target distributions based on Wasserstein gradient flows. For a broad class of standard f-divergence objectives, we show that the induced velocity field admits the universal form $\mathbf{V}(x)=w(r(x))\,β(x)$, where $β(x)=\nabla \log (p(x)/q(x))$ is shared across objectives and $w$ is determined solely by the choice of divergence. This decomposition shows that standard f-divergence drifts share the same asymptotic target distribution $p$ and differ primarily in how they redistribute transient repair effort across under-covered regions. To formalize this distinction, we derive a one-step regional-response theory for a soft under-coverage functional and obtain a compression--elasticity identity that links divergence choice to the geometry of mass transport into under-covered regions. We further extend the framework beyond the f-divergence family to the Log-Variance (LV) divergence, analyze how the reference distribution alters the resulting drift structure, and motivate a practical LV-inspired surrogate for data-free training. Based on this theory, we instantiate the framework with a KDE-based implementation and describe a complementary normalizing-flow route, enabling one-step inference after training. Experiments on multimodal Gaussian-mixture benchmarks are consistent with the theoretical predictions and demonstrate effective one-step sampling on these targets.
Entropy-based Training Methods for Scalable Neural Implicit Sampler
Efficiently sampling from un-normalized target distributions is a fundamental problem in scientific computing and machine learning. Traditional approaches such as Markov Chain Monte Carlo (MCMC) guarantee asymptotically unbiased samples from such distributions but suffer from computational inefficiency, particularly when dealing with high-dimensional targets, as they require numerous iterations to generate a batch of samples. In this paper, we introduce an efficient and scalable neural implicit sampler that overcomes these limitations. The implicit sampler can generate large batches of samples with low computational costs by leveraging a neural transformation that directly maps easily sampled latent vectors to target samples without the need for iterative procedures. To train the neural implicit samplers, we introduce two novel methods: the KL training method and the Fisher training method.
Symmetry-inducedDisentanglementonGraphs
Disentanglementhasbeen formalized using a symmetry-centric notion for unstructured spaces, however, graphs have eluded a similarly rigorous treatment. We fill this gap with a new notionofconditional symmetryfordisentanglement, andleveragetoolsfromLie algebras toencode graph properties intosubgroups using suitable adaptations of generative models such as Variational Autoencoders.
Appendix: VariationalContinualBayesian Meta-Learning
In variational continual learning, the posterior distribution of interest is frequently intractable and approximation is required. We summarize the meta-training process of our VC-BML in algorithm 1. Moreover,we evaluate FTML onthe unseen tasks (i.e., tasks sampled from meta-test set) instead ofthe training tasksthattheoriginalFTMLused. It would be unfair to adopt the original initialization procedure in OSML. BOMVI [10]: In our experiments, we use variational inference to approximate the posterior of meta-parameters. E.3.2 Settings As the latent variables in this paper are meta-parameters and task-specific parameters, the dimensionality ofthelatent space isactually determined bythenumber ofparameters inthedeep neural network. In particular, we define a CNN architecture and present its details in Table 1.
SupplementaryMaterialFor StochasticMultipleTargetSamplingGradientDescent
By contrast, there isonly one quadratic programming problem solving inour proposed method, which significantly reduces time complexity, especially when the number of particles is high. The mean square error for each task and the average results are shown in Table 1. MT-SGD outperforms thesecond-best method, MOO-SVGD, with0.2251vs. However, on the one hand, computingU's entries can be accelerated in practice bycalculating theminparallel sincethereisnointeraction between themduring forwardpass. Allimagesareresizedto 64 64 3. Due tospace constraints, we report only the abbreviation ofeach task inthe main paper,their full namesarepresentedbelow.