Goto

Collaborating Authors

 supplementary


Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyré APseudocode for gradient descent implementation

Neural Information Processing Systems

Note that f(βt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize γk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize γt > 0; Result: βt 2 while not converged do 3 if n6 mand λ>0 then 4 ut = diag(vt)X>Xdiag(vt) + λId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ϕ can be written in terms of its convex conjugate which has domain Rd . For the expression of ψwhen Ris a norm,from the above, we know that ψ = ( ϕ) ( z), and recall that for any norm, R(β) = maxR (w)61hw, βi. We derive some properties of the function h: Lemma 1.


Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity 1 Further Results of the impact of sparsity on Shape Bias Benchmark

Neural Information Processing Systems

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. For ViT, we also apply the spatial Top-K operation as described in the general response. We can observe an increase in both ResNet-50 and ViT-B architectures, furthering closing the gap between human and existing models. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). The ResNet-50's sparsity definition is the same as AlexNet and VGG. For ViT-B, we reshape the intermediate activation response from [n, h * w, d] to [n, d, h * w] and apply the Top-K selection over dimension 2 before the activation is passed through the multiple head attention (Note that h and w is the height and weight of the latent tensor after reshape it to 2d, for ViT-B with patch size 16 on the 224x224 images, h=w=14, n denotes the batch size).


On Path Integration of Grid Cells: Group Representation and Isotropic Scaling

Neural Information Processing Systems

Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the transformation. One is a group representation condition that is necessary for path integration. The other is an isotropic scaling condition that ensures locally conformal embedding, so that the error in the vector representation translates conformally to the error in the 2D self-position. Then we investigate the simplest transformation, i.e., the linear transformation, uncover its explicit algebraic and geometric structure as matrix Lie group of rotation, and explore the connection between the isotropic scaling condition and a special class of hexagon grid patterns. Finally, with our optimization-based approach, we manage to learn hexagon grid patterns that share similar properties of the grid cells in the rodent brain. The learned model is capable of accurate long distance path integration.





Supplementary: Non-Local Latent Relation Distillation for Self-Adaptive 3DHuman Pose Estimation

Neural Information Processing Systems

The raw video frames are forwarded through a person-detector [15] to obtain the person-focused image sequences. Note that, the detector pruned video sequences may not have a smooth pixel transition. However, it retains the smooth pose transition at the view-variant root-relative system. In our work, the shared latent pose can be seen as a parametric form to represent plausible 3D poses. And, the image-to-latent model is trained to regress the latent pose parameters with latent being an intermediate 3D pose representation.




Supplementary for Frederik Technical frwa@dtu.dk

Neural Information Processing Systems

Moreover, wehighlight unitsphere, theequivalence (31) holds. D.1 Equi Thecontrasti distribution,y = 0 is attractiveP L (f (xe),f (xa)) = 1 2 kf (xe) f (xa)k2 = logc ) logP (f (xe)|xa, ) whileaney= <0isrelated P L (f (xe),f (xa)) = 1 2 kf (xe) f (xa)k2 = 2 + 1 2 kf (xe)+ f (xa)k2 = 2 + logc ) logP (f (xe)|xa, ).