Goto

Collaborating Authors

 tgt


FSI-Edit: Frequency and Stochasticity Injection for Flexible Diffusion-Based Image Editing

Neural Information Processing Systems

Latent Diffusion-based Text-to-Image (T2I) is a free image editing tool that typically reverses an image into noise, reconstructs it using its original text prompt, and then generates an edited version under a new target prompt. To preserve unaltered image content, features from the reconstruction are directly injected to replace selected features in the generation. However, this direct replacement often leads to feature incompatibility, compromising editing fidelity and limiting creative flexibility, particularly for non-rigid edits (e.g., structural or pose changes). In this paper, we aim to address these limitations by proposing FSI-Edit, a novel framework using frequency-and stochasticity-based feature injection for flexible image editing. First, FSI-Edit enhances feature consistency by injecting high-frequency components of reconstruction features into generation features, mitigating incompatibility while preserving the editing ability for major structures encoded in low-frequency information. Second, it introduces controlled noise into the replaced reconstruction features, expanding the generative space to enable diverse non-rigid edits beyond the original image's constraints. Experiments on non-rigid edits, e.g., addition, deletion, and pose manipulation, demonstrate that FSI-Edit outperforms existing baselines in target alignment, semantic fidelity and visual quality. Our work highlights the critical roles of frequency-aware design and stochasticity in overcoming rigidity in diffusion-based editing.


Generative Modeling by Value-Driven Transport

arXiv.org Machine Learning

We propose a new framework for generative modeling based on a discrete-time stochastic control formulation of measure transport. Adapting classic results from control theory, we formulate our problem as a linear program whose dual variables correspond to the \emph{optimal value function} of the control problem, which directly encodes the optimal control policy. Exploiting this LP formulation, we develop an efficient simulation-free primal-dual algorithm for computing approximately optimal value functions and the associated \emph{value-driven transport} (VDT) policies which approximate the true optimal policy. We show that well-trained VDT policies enjoy numerous favorable properties in comparison with other state-of-the-art methods based on flows, diffusions, or Schrรถdinger bridges: they lead to straight transport paths which can be simulated quickly and robustly, and can be enhanced in all the same ways as diffusion and flow-based models (e.g., conditional generation, classifier-free guidance, unpaired data-to-data translation are all easy to incorporate). We evaluate our methodology in a range of experiments, with results that indicate strong performance and good potential for scalability.


A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization

arXiv.org Machine Learning

In the era of large language models (LLMs), fine-tuning pretrained models has become ubiquitous. Yet the theoretical underpinning remains an open question. A central question is why only a few epochs of fine-tuning are typically sufficient to achieve strong performance on many different tasks. In this work, we approach this question by developing a statistical framework, combining rigorous early stopping theory with the attention-based Neural Tangent Kernel (NTK) for LLMs, offering new theoretical insights on fine-tuning practices. Specifically, we formally extend classical NTK theory [Jacot et al., 2018] to non-random (i.e., pretrained) initializations and provide a convergence guarantee for attention-based fine-tuning. One key insight provided by the theory is that the convergence rate with respect to sample size is closely linked to the eigenvalue decay rate of the empirical kernel matrix induced by the NTK. We also demonstrate how the framework can be used to explain task vectors for multiple tasks in LLMs. Finally, experiments with modern language models on real-world datasets provide empirical evidence supporting our theoretical insights.






A Appendix

Neural Information Processing Systems

In the appendix, we have the following results. In Appendix A.1, we summarize the main notations used in this paper. In Appendix A.2 - A.9, we show all the proofs of our theoretical results. In Appendix A.10, we present the overall training procedures (e.g., pseudo code) of our proposed DINO-INIT and DINO-TRAIN algorithms, as well as the limitations of our work. Assume that all the parameters of f() follows standard normal distribution, in the limits as the layer width d!1, the output function of the distribution-informed neural network f(x) in Eq (5) at initialization is iid centered Gaussian process, i.e., f() N 0, K Using the definition of the distribution kernel in Eq. (6), we have K It is shown [4] that the key difference between NNGP kernel and NTK is that NTK is generated by a fully-trained neural network, whereas NNGP kernel is produced by a weakly-trained neural network.