Structured Sparsity and Generalization
Maurer, Andreas, Pontil, Massimiliano
We study a class of regularization methods used to learn a linear function from a finite set of examples. The regularizer is expressed as an infimum convolution which involves a set M of linear transformations (see equation (1) below). As we shall see, this regularizer generalizes, depending on the choice of the set M, the regularizers used by several learning algorithms, such as ridge regression, the Lasso, the group Lasso [22], multiple kernel learning [10], the group Lasso with overlap [6], and the regularizers in [16]. We give a bound on the Rademacher average of the linear function class associated with this regularizer. The result matches existing bounds in the above mentioned cases but also admits a novel, dimension free interpretation.
Sep-2-2011