Degrees of freedom for off-the-grid sparse estimation
Clarice Poon, Gabriel Peyr e † November 12, 2019 Abstract A central question in modern machine learning and imaging sciences is to quantify the number of effective parameters of vastly over-parameterized models. The degrees of freedom is a mathematically convenient way to define this number of parameters. Its computation and properties are well understood when dealing with discretized linear models, possibly regularized using sparsity. In this paper, we argue that this way of thinking is plagued when dealing with models having very large parameter spaces. In this case it makes more sense to consider "off-the-grid" approaches, using a continuous parameter space. This type of approach is the one favoured when training multi-layer perceptrons, and is also becoming popular to solve super-resolution problems in imaging. Training these off-the-grid models with a sparsity inducing prior can be achieved by solving a convex optimization problem over the space of measures, which is often called the Beurling Lasso (Blasso), and is the continuous counterpart of the celebrated Lasso parameter selection method. In previous works [41, 19], the degrees of freedom for the Lasso was shown to coincide with the size of the smallest solution support. Our main contribution is a proof of a continuous counterpart to this result for the Blasso. While in dimension d, each of the k nonzero recovered atom in the recovered measure carries over d 1 parameters ( d for the position and 1 for the weight), a surprising implication of our new formula it that the degrees of freedom for these off-the-grid models is in general strictly smaller ( d 1)k . Our findings thus suggest that discretized methods actually vastly overestimate the number of intrinsic continuous degrees of freedom. Our second contribution is a detailed study of the case of sampling Fourier coefficients in 1D, which corresponds to a super-resolution problem. We show that our formula for the degrees of freedom is valid outside of a set of measure zero of observations, which in turn justifies its use to compute an unbiased estimator of the prediction risk using the Stein Unbiased Risk Estimator (SURE). We also report numerical results for both the case of Fourier sampling and the learning of a multilayers perceptron with a single hidden layer.
Nov-8-2019
- Country:
- Europe
- France (0.04)
- United Kingdom (0.04)
- North America > United States
- New York (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Energy > Power Industry (1.00)
- Technology: