Supplementary Material to Linear Disentangled Representations and Unsupervised Action Estimation
When we predict post-action latent codes through a linear combination of representations, we lose the guarantee that the gradient will point towards this solution. Since reinforce applies solely one representation exactly once, we are guaranteed that (if the policy is accurate and the latent structure is amenable) the gradient will point towards this solution. We find that the cyclic representation error ||ˆα α|| = 0.157 is far worse than the 0.012 error of RGrVAE. Furthermore, the independence score is 0.830
Supplementary Material Estimation of Conditional Moment Models Contents
The most prevalent approach for estimating endogenous regression models with instruments is assuming low-dimensional linear relationships, i.e. h The coefficient in the final regression is taken to be the estimate of . Then a 2SLS estimation method is applied on these transformed feature spaces. The authors show asymptotic consistency of the resulting estimator, assuming that the approximation error goes to zero. Subsequently, they also estimate the function m(z) =E[y h(x) | z] based on another growing sieve. Though it may seem at first that the approach in that paper and ours are quite distinct, the population limit of our objective function coincides with theirs. To see this, consider the simplified version of our estimator presented in (6), where the function classes are already norm-constrained and no norm based regularization is imposed. Moreover, for a moment consider the population version of this estimator, i.e. min max (h, f) kfk Thus in the population limit and without norm regularization on the test function f, our criterion is equivalent to the minimum distance criterion analyzed in Chen and Pouzo [2012]. Another point of similarity is that we prove convergence of the estimator in terms of the pseudo-metric, the projected MSE defined in Section 4 of Chen and Pouzo [2012] - and like that paper we require additional conditions to relate the pseudo-metric to the true MSE. The present paper differs in a number of ways: (i) the finite sample criterion is different; (ii) we prove our results using localized Rademacher analysis which allows for weaker assumptions; (iii) we consider a broader range of estimation approaches than linear sieves, necessitating more of a focus on optimization. Digging into the second point, Chen and Pouzo [2012] take a more traditional parameter recovery approach which requires several minimum eigenvalue conditions and several regularity conditions to be satisfied for their estimation rate to hold (see e.g. This is analogous to a mean squared error proof in an exogenous linear regression setting, that requires the minimum eigenvalue of the feature co-variance to be bounded away from zero. Moreover, such parameter recovery methods seem limited to the growing sieve approach, since only then one has a clear finite dimensional parameter vector to work on for each fixed n.
Global Convergence of Online Optimization for Nonlinear Model Predictive Control
We study a real-time iteration (RTI) scheme for solving online optimization problem appeared in nonlinear optimal control. The proposed RTI scheme modifies the existing RTI-based model predictive control (MPC) algorithm, by selecting the stepsize of each Newton step at each sampling time using a differentiable exact augmented Lagrangian. The scheme can adaptively select the penalty parameters of augmented Lagrangian on the fly, which are shown to be stabilized after certain time periods. We prove under generic assumptions that, by involving stepsize selection instead of always using a full Newton step (like what most of the existing RTIs do), the scheme converges globally: for any initial point, the KKT residuals of the subproblems converge to zero. A key step is to show that augmented Lagrangian keeps decreasing as horizon moves forward. We demonstrate the global convergence behavior of the proposed RTI scheme in a numerical experiment.
Supplementary material for Dynamic Causal Bayesian Optimisation
In this section we give the proof for Theorem 1 in the main text. W. This means that W includes those variables that are parents of Y Eq. (1) follows from Y Eq. (2) follows from the Eq. Exploiting Eq. (8) we can rewrite Eq. (6) as: We can further expand Eq. (11) noticing that in this case W = {Z In this section we give the proof for Proposition 3.1 in the main text. This section contains additional experimental details associated to the experiments discussed in Section 4 of the main text. Notice how the location of the optimum changes significantly both in terms of optimal set and intervention value when going from t = 0 to t = 1.
Dataset Distillation using Neural Feature Regression
Dataset distillation aims to learn a small synthetic dataset that preserves most of the information from the original dataset. Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the metadataset and the inner loop trains a model on the distilled data. Meta-gradient computation is one of the key challenges in this formulation, as differentiating through the inner loop learning procedure introduces significant computation and memory costs. In this paper, we address these challenges using neural Feature Regression with Pooling (FRePo), achieving the state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods. The proposed algorithm is analogous to truncated backpropagation through time with a pool of models to alleviate various types of overfitting in dataset distillation. FRePo significantly outperforms the previous methods on CIFAR100, Tiny ImageNet, and ImageNet-1K. Furthermore, we show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense. Please check out our webpage at https://sites.google.com/view/frepo.
Self-Supervised Few-Shot Learning on Point Clouds
Visualization of ball covers The cover-tree approach of using the balls to group the points in a point cloud is visualized in Figure 1. The visualization shows the process of considering balls shown as transparent spheres at different scales with different densities in a cover-tree. Fig 1a represents the top level (root) of cover-tree which covers the point cloud in a single ball i.e., at level i. Fig 1b and Fig 1c shows the balls at lower level with smaller radiuses as the tree is descended. Thus, we learn local features using balls at various levels with different packing densities. A.1 3D Object Classification Training This section provides the implementation details of our proposed self-supervised network.
Multiview Human Body Reconstruction from Uncalibrated Cameras
We present a new method to reconstruct 3D human body pose and shape by fusing visual features from multiview images captured by uncalibrated cameras. Existing multiview approaches often use spatial camera calibration (intrinsic and extrinsic parameters) to geometrically align and fuse visual features. Despite remarkable performances, the requirement of camera calibration restricted their applicability to real-world scenarios, e.g., reconstruction from social videos with wide-baseline cameras. We address this challenge by leveraging the commonly observed human body as a semantic calibration target, which eliminates the requirement of camera calibration. Specifically, we map per-pixel image features to a canonical body surface coordinate system agnostic to views and poses using dense keypoints (correspondences). This feature mapping allows us to semantically, instead of geometrically, align and fuse visual features from multiview images. We learn a self-attention mechanism to reason about the confidence of visual features across and within views. With fused visual features, a regressor is learned to predict the parameters of a body model. We demonstrate that our calibration-free multiview fusion method reliably reconstructs 3D body pose and shape, outperforming stateof-the-art single view methods with post-hoc multiview fusion, particularly in the presence of non-trivial occlusion, and showing comparable accuracy to multiview methods that require calibration.