Goto

Collaborating Authors

Adversarial Schrödinger Bridge Matching Nikita Gushchin Skoltech

Neural Information Processing Systems

The Schrödinger Bridge (SB) problem offers a powerful framework for combining optimal transport and diffusion models. A promising recent approach to solve the SB problem is the Iterative Markovian Fitting (IMF) procedure, which alternates between Markovian and reciprocal projections of continuous-time stochastic processes. However, the model built by the IMF procedure has a long inference time due to using many steps of numerical solvers for stochastic differential equations. To address this limitation, we propose a novel Discrete-time IMF (D-IMF) procedure in which learning of stochastic processes is replaced by learning just a few transition probabilities in discrete time. Its great advantage is that in practice it can be naturally implemented using the Denoising Diffusion GAN (DD-GAN), an already well-established adversarial generative modeling technique. We show that our D-IMF procedure can provide the same quality of unpaired domain translation as the IMF, using only several generation steps instead of hundreds.


DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction

Neural Information Processing Systems

Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, it is difficult to train a diffusion model directly on the entire volume of high-dimensional data to obtain an efficient 3D diffusion prior. Existing works utilizing diffusion priors on single 2D image-slice with hand-crafted cross-slice regularization would sacrifice the z-axis consistency, which results in severe artifacts along the z-axis. In this work, we propose a novel framework that enables learning the 3D image prior through position-aware 3D-patch diffusion score blending for reconstructing large-scale 3D medical images. To the best of our knowledge, we are the first to utilize a 3D-patch diffusion prior for 3D medical image reconstruction. Extensive experiments on sparse view and limited angle CT reconstruction show that our DiffusionBlend method significantly outperforms previous methods and achieves state-of-the-art performance on real-world CT reconstruction problems with high-dimensional 3D image (i.e., 256 256 500). Our algorithm also comes with better or comparable computational efficiency than previous state-of-the-art methods.


Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes

Neural Information Processing Systems

We implement gradient-based variational inference routines for Wishart and inverse Wishart processes, which we apply as Bayesian models for the dynamic, heteroskedastic covariance matrix of a multivariate time series. The Wishart and inverse Wishart processes are constructed from i.i.d. Gaussian processes, existing variational inference algorithms for which form the basis of our approach. These methods are easy to implement as a black-box and scale favorably with the length of the time series, however, they fail in the case of the Wishart process, an issue we resolve with a simple modification into an additive white noise parameterization of the model. This modification is also key to implementing a factored variant of the construction, allowing inference to additionally scale to high-dimensional covariance matrices. Through experimentation, we demonstrate that some (but not all) model variants outperform multivariate GARCH when forecasting the covariances of returns on financial instruments.


Dynamic allocation of limited memory resources in reinforcement learning - Appendix

Neural Information Processing Systems

A.1 Gradient of the log policy In order to compute In this section, we abuse the notation slightly to omit the explicit dependence on the state-action pair (s, a) for clarity, and instead place it in the subscript. Next, we relax the limit β in Eq. A.1 so as to differentiate the logarithm of the policy log π for β > 0 with respect to the relevant elements of the resource allocation vector σ(s, a) as follows: β( q A.2 Gradient of the cost In this section, we show how to compute I), and I is the identity matrix. Since the covariance matrix is diagonal, we can take the gradient with respect to elements of the resource allocation vector σ individually. In other words, we can take the gradient of each memory's marginal normal distribution with its standard deviation: ( A.4, A.5, and A.6, we can write our analytically obtained gradient of the cost term with respect to individual elements of the resource allocation vector σ(s, a) as: ( D A.3 Justification for our choice of the gradient of expected reward A potential concern regarding our method of allocating resources may be our choice of the advantage function to compute the gradient of the expected rewards (Eqs.


Dynamic allocation of limited memory resources in reinforcement learning Nisheet Patel

Neural Information Processing Systems

Biological brains are inherently limited in their capacity to process and store information, but are nevertheless capable of solving complex tasks with apparent ease. Intelligent behavior is related to these limitations, since resource constraints drive the need to generalize and assign importance differentially to features in the environment or memories of past experiences. Recently, there have been parallel efforts in reinforcement learning and neuroscience to understand strategies adopted by artificial and biological agents to circumvent limitations in information storage. However, the two threads have been largely separate. In this article, we propose a dynamical framework to maximize expected reward under constraints of limited resources, which we implement with a cost function that penalizes precise representations of action-values in memory, each of which may vary in its precision.


c4fac8fb3c9e17a2f4553a001f631975-AuthorFeedback.pdf

Neural Information Processing Systems

We thank reviewers R1, R2, R3, and R4 for their constructive and helpful feedback. We aim to explore these ideas in future work. Our work is meant to complement these previous studies. We modified the Related work section to discuss and point out how these proposals complement our work. "This is not a paper searching for state of the art results, and it should not be Figure 1d), and report a 2 and 1.3 improvement in the We added these results to the revised manuscript.


A Game Theoretic Approach to Class-wise Selective Rationalization

Neural Information Processing Systems

Selection of input features such as relevant pieces of text has become a common technique of highlighting how complex neural predictors operate. The selection can be optimized post-hoc for trained models or incorporated directly into the method itself (self-explaining). However, an overall selection does not properly capture the multi-faceted nature of useful rationales such as pros and cons for decisions. To this end, we propose a new game theoretic approach to class-dependent rationalization, where the method is specifically trained to highlight evidence supporting alternative conclusions. Each class involves three players set up competitively to find evidence for factual and counterfactual scenarios. We show theoretically in a simplified scenario how the game drives the solution towards meaningful class-dependent rationales. We evaluate the method in single-and multi-aspect sentiment classification tasks and demonstrate that the proposed method is able to identify both factual (justifying the ground truth label) and counterfactual (countering the ground truth label) rationales consistent with human rationalization.


class-specific generators will converge to a set of degenerated solutions, where, rather than highlighting the informative

Neural Information Processing Systems

We thank all the reviewers. Comments 1 & 4: It turns out that RNP (The baseline in "Rationalizing Nueral Predictions" proposed by Lei et. In fact, even the original RNP suffers from the degeneration problem. The problem primarily results from the collaborative nature of the RNP framework. This is another major advantage of CAR, which we did not have enough space to uncover in the paper.


Multidimensional Fractional Programming for Normalized Cuts Yannan Chen 1 Beichen Huang 2

Neural Information Processing Systems

The Normalized cut (NCut) problem is a fundamental and yet notoriously difficult one in the unsupervised clustering field. Because the NCut problem is fractionally structured, the fractional programming (FP) based approach has worked its way into a new frontier. However, the conventional FP techniques are insufficient: the classic Dinkelbach's transform can only deal with a single ratio and hence is limited to the two-class clustering, while the state-of-the-art quadratic transform accounts for multiple ratios but fails to convert the NCut problem to a tractable form. This work advocates a novel extension of the quadratic transform to the multidimensional ratio case, thereby recasting the fractional 0-1 NCut problem into a bipartite matching problem--which can be readily solved in an iterative manner. Furthermore, we explore the connection between the proposed multidimensional FP method and the minorization-maximization theory to verify the convergence.


DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification Kaibo Wang 1

Neural Information Processing Systems

Diffusion-based purification has demonstrated impressive robustness as an adversarial defense. However, concerns exist about whether this robustness arises from insufficient evaluation. Our research shows that EOT-based attacks face gradient dilemmas due to global gradient averaging, resulting in ineffective evaluations. Additionally, 1-evaluation underestimates resubmit risks in stochastic defenses. To address these issues, we propose an effective and efficient attack named DiffHammer.