Goto

Collaborating Authors

 landmark


How Data Augmentation Shapes Neural Representations

arXiv.org Machine Learning

Data augmentation is widely recognized for improving generalization in deep networks, yet its impact on the geometry of learned representations remains poorly understood. In this work, we characterize how different data augmentation strategies reshape internal representations in neural networks. Using tools from shape analysis, we embed network hidden representations into a metric space where distance is invariant to scaling, translation, rotation and reflection. We show that increasing augmentation strength leads to well-behaved trajectories in this space, and that different augmentation types steer representations in distinct directions. Moreover, we investigate how neural representation shapes are distorted along data augmentation trajectories, and show that insights from neural geometry can predict which representations provide the most improvement when ensembling models. Our results reveal shared geometric patterns across architectures and seeds, and suggest that analyzing shape-space trajectories offers a principled tool for understanding and comparing data augmentation methods.




Supplementary material for Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

Neural Information Processing Systems

All the source code can be found at our project website https://sites.google.com/view/ In order to prove Theorem 1, we introduce the following lemma, which uses Assumption 1. Lemma 1. The proof is largely based on [2]. Let Hd = H Hbe a vector-valued RKHS, and F[f] be a functional of f. Pure Task Expansion Results on MPE: VACL contains entity progression in the result of Figure 1. To specifically study the performance of task expansion, we exclude entity progression module from VACL and compare with baselines in Simple-Spread with n= 4 and Push-Ball with n= 2. For a fair comparison, we also provide additional experiments to combine GoalGAN and AMIGo with the initial knowledge of easy tasks.





Appendix 1 Interpretation using rank-1 Nyström approximation

Neural Information Processing Systems

The bound in Equation 5 of the main paper can be interpreted using a rank-1 Nyström approximation for f(xt,xt). By holding w fixed and maximizing for q in the right hand side of Equation 5, we get q = f(w,w) P t ytf(xt,w) where f(w,w) indicates the pseudo-inverse.1 Typically the weight vector w, often called a "landmark", used in the Nyström approximation is set either by setting it to a random input or by more sophisticated schemes like setting it with KMeans. In our case, we are directly optimizing the landmarks via Equation 6 in the main paper. To our knowledge the only other work to do this was performed in Fu [2014]. The code used in the main training loop of our algorithm is shown in Figure 1.


Kernel similarity matching with Hebbian Networks

Neural Information Processing Systems

Recent works have derived neural networks with online correlation-based learning rules to perform kernel similarity matching. These works applied existing linear similarity matching algorithms to nonlinear features generated with random Fourier methods. In this paper we attempt to perform kernel similarity matching by directly learning the nonlinear features. Our algorithm proceeds by deriving and then minimizing an upper bound for the sum of squared errors between output and input kernel similarities. The construction of our upper bound leads to online correlation-based learning rules which can be implemented with a 1 layer recurrent neural network. In addition to generating high-dimensional linearly separable representations, we show that our upper bound naturally yields representations which are sparse and selective for specific input patterns. We compare the approximation quality of our method to neural random Fourier method and variants of the popular but non-biological "Nyström" method for approximating the kernel matrix. Our method appears to be comparable or better than randomly sampled Nyström methods when the outputs are relatively low dimensional (although still potentially higher dimensional than the inputs) but less faithful when the outputs are very high dimensional.


Accelerate Vector Diffusion Maps by Landmarks

arXiv.org Machine Learning

We propose a landmark-constrained algorithm, LA-VDM (Landmark Accelerated Vector Diffusion Maps), to accelerate the Vector Diffusion Maps (VDM) framework built upon the Graph Connection Laplacian (GCL), which captures pairwise connection relationships within complex datasets. LA-VDM introduces a novel two-stage normalization that effectively address nonuniform sampling densities in both the data and the landmark sets. Under a manifold model with the frame bundle structure, we show that we can accurately recover the parallel transport with landmark-constrained diffusion from a point cloud, and hence asymptotically LA-VDM converges to the connection Laplacian. The performance and accuracy of LA-VDM are demonstrated through experiments on simulated datasets and an application to nonlocal image denoising.