Plotting

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques Zitai Wang 3 Zhiyong Yang

Neural Information Processing Systems

Diffusion models are powerful generative models, and this capability can also be applied to discrimination. The inner activations of a pre-trained diffusion model can serve as features for discriminative tasks, namely, diffusion feature. We discover that diffusion feature has been hindered by a hidden yet universal phenomenon that we call content shift. To be specific, there are content differences between features and the input image, such as the exact shape of a certain object. We locate the cause of content shift as one inherent characteristic of diffusion models, which suggests the broad existence of this phenomenon in diffusion feature.



Supplementary Material to Linear Disentangled Representations and Unsupervised Action Estimation

Neural Information Processing Systems

When we predict post-action latent codes through a linear combination of representations, we lose the guarantee that the gradient will point towards this solution. Since reinforce applies solely one representation exactly once, we are guaranteed that (if the policy is accurate and the latent structure is amenable) the gradient will point towards this solution. We find that the cyclic representation error ||ห†ฮฑ ฮฑ|| = 0.157 is far worse than the 0.012 error of RGrVAE. Furthermore, the independence score is 0.830


Supplementary Material Estimation of Conditional Moment Models Contents

Neural Information Processing Systems

The most prevalent approach for estimating endogenous regression models with instruments is assuming low-dimensional linear relationships, i.e. h The coefficient in the final regression is taken to be the estimate of . Then a 2SLS estimation method is applied on these transformed feature spaces. The authors show asymptotic consistency of the resulting estimator, assuming that the approximation error goes to zero. Subsequently, they also estimate the function m(z) =E[y h(x) | z] based on another growing sieve. Though it may seem at first that the approach in that paper and ours are quite distinct, the population limit of our objective function coincides with theirs. To see this, consider the simplified version of our estimator presented in (6), where the function classes are already norm-constrained and no norm based regularization is imposed. Moreover, for a moment consider the population version of this estimator, i.e. min max (h, f) kfk Thus in the population limit and without norm regularization on the test function f, our criterion is equivalent to the minimum distance criterion analyzed in Chen and Pouzo [2012]. Another point of similarity is that we prove convergence of the estimator in terms of the pseudo-metric, the projected MSE defined in Section 4 of Chen and Pouzo [2012] - and like that paper we require additional conditions to relate the pseudo-metric to the true MSE. The present paper differs in a number of ways: (i) the finite sample criterion is different; (ii) we prove our results using localized Rademacher analysis which allows for weaker assumptions; (iii) we consider a broader range of estimation approaches than linear sieves, necessitating more of a focus on optimization. Digging into the second point, Chen and Pouzo [2012] take a more traditional parameter recovery approach which requires several minimum eigenvalue conditions and several regularity conditions to be satisfied for their estimation rate to hold (see e.g. This is analogous to a mean squared error proof in an exogenous linear regression setting, that requires the minimum eigenvalue of the feature co-variance to be bounded away from zero. Moreover, such parameter recovery methods seem limited to the growing sieve approach, since only then one has a clear finite dimensional parameter vector to work on for each fixed n.



Global Convergence of Online Optimization for Nonlinear Model Predictive Control

Neural Information Processing Systems

We study a real-time iteration (RTI) scheme for solving online optimization problem appeared in nonlinear optimal control. The proposed RTI scheme modifies the existing RTI-based model predictive control (MPC) algorithm, by selecting the stepsize of each Newton step at each sampling time using a differentiable exact augmented Lagrangian. The scheme can adaptively select the penalty parameters of augmented Lagrangian on the fly, which are shown to be stabilized after certain time periods. We prove under generic assumptions that, by involving stepsize selection instead of always using a full Newton step (like what most of the existing RTIs do), the scheme converges globally: for any initial point, the KKT residuals of the subproblems converge to zero. A key step is to show that augmented Lagrangian keeps decreasing as horizon moves forward. We demonstrate the global convergence behavior of the proposed RTI scheme in a numerical experiment.



Supplementary material for Dynamic Causal Bayesian Optimisation

Neural Information Processing Systems

In this section we give the proof for Theorem 1 in the main text. W. This means that W includes those variables that are parents of Y Eq. (1) follows from Y Eq. (2) follows from the Eq. Exploiting Eq. (8) we can rewrite Eq. (6) as: We can further expand Eq. (11) noticing that in this case W = {Z In this section we give the proof for Proposition 3.1 in the main text. This section contains additional experimental details associated to the experiments discussed in Section 4 of the main text. Notice how the location of the optimum changes significantly both in terms of optimal set and intervention value when going from t = 0 to t = 1.


Dataset Distillation using Neural Feature Regression

Neural Information Processing Systems

Dataset distillation aims to learn a small synthetic dataset that preserves most of the information from the original dataset. Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the metadataset and the inner loop trains a model on the distilled data. Meta-gradient computation is one of the key challenges in this formulation, as differentiating through the inner loop learning procedure introduces significant computation and memory costs. In this paper, we address these challenges using neural Feature Regression with Pooling (FRePo), achieving the state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods. The proposed algorithm is analogous to truncated backpropagation through time with a pool of models to alleviate various types of overfitting in dataset distillation. FRePo significantly outperforms the previous methods on CIFAR100, Tiny ImageNet, and ImageNet-1K. Furthermore, we show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense. Please check out our webpage at https://sites.google.com/view/frepo.