Goto

Collaborating Authors

 eigenvalue spectrum


Model Reprogramming Demystified: A Neural Tangent Kernel Perspective

arXiv.org Artificial Intelligence

Model Reprogramming (MR) is a resource-efficient framework that adapts large pre-trained models to new tasks with minimal additional parameters and data, offering a promising solution to the challenges of training large models for diverse tasks. Despite its empirical success across various domains such as computer vision and time-series forecasting, the theoretical foundations of MR remain underexplored. In this paper, we present a comprehensive theoretical analysis of MR through the lens of the Neural Tangent Kernel (NTK) framework. We demonstrate that the success of MR is governed by the eigenvalue spectrum of the NTK matrix on the target dataset and establish the critical role of the source model's effectiveness in determining reprogramming outcomes. Our contributions include a novel theoretical framework for MR, insights into the relationship between source and target models, and extensive experiments validating our findings.


How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings

arXiv.org Artificial Intelligence

Neural networks that map between low dimensional spaces are ubiquitous in computer graphics and scientific computing; however, in their naive implementation, they are unable to learn high frequency information. We present a comprehensive analysis comparing the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE). FFEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail. FFE's roots in the Fourier transform, make it susceptible to aliasing if pushed too far, while MPEs, which use a learned grid structure, have no such limitation. To understand the difference in performance, we use the neural tangent kernel (NTK) to evaluate these encodings through the lens of an analogous kernel regression. By finding a lower bound on the smallest eigenvalue of the NTK, we prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding. This mechanism is fundamentally different from FFEs, which rely solely on their embedding space to improve performance. Results are empirically validated on a 2D image regression task using images taken from 100 synonym sets of ImageNet and 3D implicit surface regression on objects from the Stanford graphics dataset. Using peak signal-to-noise ratio (PSNR) and multiscale structural similarity (MS-SSIM) to evaluate how well fine details are learned, we show that the MPE increases the minimum eigenvalue by 8 orders of magnitude over the baseline and 2 orders of magnitude over the FFE. The increase in spectrum corresponds to a 15 dB (PSNR) / 0.65 (MS-SSIM) increase over baseline and a 12 dB (PSNR) / 0.33 (MS-SSIM) increase over the FFE.


Optimal Initialization Strategies for Range-Only Trajectory Estimation

arXiv.org Artificial Intelligence

Range-only (RO) pose estimation involves determining a robot's pose over time by measuring the distance between multiple devices on the robot, known as tags, and devices installed in the environment, known as anchors. The nonconvex nature of the range measurement model results in a cost function with possible local minima. In the absence of a good initialization, commonly used iterative solvers can get stuck in these local minima resulting in poor trajectory estimation accuracy. In this work, we propose convex relaxations to the original nonconvex problem based on semidefinite programs (SDPs). Specifically, we formulate computationally tractable SDP relaxations to obtain accurate initial pose and trajectory estimates for RO trajectory estimation under static and dynamic (i.e., constant-velocity motion) conditions. Through simulation and real experiments, we demonstrate that our proposed initialization strategies estimate the initial state accurately compared to iterative local solvers. Additionally, the proposed relaxations recover global minima under moderate range measurement noise levels.


Improving Delay Based Reservoir Computing via Eigenvalue Analysis

arXiv.org Machine Learning

We analyze the reservoir computation capability of the Lang-Kobayashi system by comparing the numerically computed recall capabilities and the eigenvalue spectrum. We show that these two quantities are deeply connected, and thus the reservoir computing performance is predictable by analyzing the eigenvalue spectrum. Our results suggest that any dynamical system used as a reservoir can be analyzed in this way as long as the reservoir perturbations are sufficiently small. Optimal performance is found for a system with the eigenvalues having real parts close to zero and off-resonant imaginary parts.


Analysis of overfitting in the regularized Cox model

arXiv.org Machine Learning

The Cox proportional hazards model is ubiquitous in the analysis of time-to-event data. However, when the data dimension p is comparable to the sample size $N$, maximum likelihood estimates for its regression parameters are known to be biased or break down entirely due to overfitting. This prompted the introduction of the so-called regularized Cox model. In this paper we use the replica method from statistical physics to investigate the relationship between the true and inferred regression parameters in regularized multivariate Cox regression with L2 regularization, in the regime where both p and N are large but with p/N ~ O(1). We thereby generalize a recent study from maximum likelihood to maximum a posteriori inference. We also establish a relationship between the optimal regularization parameter and p/N, allowing for straightforward overfitting corrections in time-to-event analysis.


Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

Neural Information Processing Systems

We derive the limiting form of the eigenvalue spectrum for sample covariance matrices produced from non-isotropic data. For the analysis of standard PCA we study the case where the data has increased variance along a small number of symmetry-breaking directions. The spectrum depends on the strength of the symmetry-breaking signals and on a parameter α which is the ratio of sample size to data dimension. Results are derived in the limit of large data dimension while keeping α fixed. As α increases there are transitions in which delta functions emerge from the upper end of the bulk spectrum, corresponding to the symmetry-breaking directions in the data, and we calculate the bias in the corresponding eigenvalues. For kernel PCA the covariance matrix in feature space may contain symmetry-breaking structure even when the data components are independently distributed with equal variance. We show examples of phase-transition behaviour analogous to the PCA results in this case.


Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

Neural Information Processing Systems

We derive the limiting form of the eigenvalue spectrum for sample covariance matrices produced from non-isotropic data. For the analysis of standard PCA we study the case where the data has increased variance along a small number of symmetry-breaking directions. The spectrum depends on the strength of the symmetry-breaking signals and on a parameter α which is the ratio of sample size to data dimension. Results are derived in the limit of large data dimension while keeping α fixed. As α increases there are transitions in which delta functions emerge from the upper end of the bulk spectrum, corresponding to the symmetry-breaking directions in the data, and we calculate the bias in the corresponding eigenvalues. For kernel PCA the covariance matrix in feature space may contain symmetry-breaking structure even when the data components are independently distributed with equal variance. We show examples of phase-transition behaviour analogous to the PCA results in this case.



The Effect of Correlated Input Data on the Dynamics of Learning

Neural Information Processing Systems

The convergence properties of the gradient descent algorithm in the case of the linear perceptron may be obtained from the response function. We derive a general expression for the response function and apply it to the case of data with simple input correlations. It is found that correlations severely may slow down learning. This explains the success of PCA as a method for reducing training time. Motivated by this finding we furthermore propose to transform the input data by removing the mean across input variables as well as examples to decrease correlations. Numerical findings for a medical classification problem are in fine agreement with the theoretical results.


The Effect of Correlated Input Data on the Dynamics of Learning

Neural Information Processing Systems

The convergence properties of the gradient descent algorithm in the case of the linear perceptron may be obtained from the response function. We derive a general expression for the response function and apply it to the case of data with simple input correlations. It is found that correlations severely may slow down learning. This explains the success of PCA as a method for reducing training time. Motivated by this finding we furthermore propose to transform the input data by removing the mean across input variables as well as examples to decrease correlations. Numerical findings for a medical classification problem are in fine agreement with the theoretical results. 1 INTRODUCTION Learning and generalization are important areas of research within the field of neural networks.Although good generalization is the ultimate goal in feed-forward networks (perceptrons), it is of practical importance to understand the mechanism which control the amount of time required for learning, i. e. the dynamics of learning. Thisis of course particularly important in the case of a large data set. An exact analysis of this mechanism is possible for the linear perceptron and as usual it is hoped that the results to some extend may be carried over to explain the behaviour of nonlinear perceptrons.