We propose a general framework for regularization based on group majorization. In this framework, a group is defined to act on the parameter space and an orbit is fixed; to control complexity, the model parameters are confined to lie in the convex hull of this orbit (the orbitope). Common regularizers are recovered as particular cases, and a connection is revealed between the recent sorted 1 -norm and the hyperoctahedral group. We derive the properties a group must satisfy for being amenable to optimization with conditional and projected gradient algorithms. Finally, we suggest a continuation strategy for orbit exploration, presenting simulation results for the symmetric and hyperoctahedral groups.
We discuss the framework of Transductive Support Vector Machine (TSVM) from the perspective of the regularization strength induced by the unlabeled data. In this framework, SVM and TSVM can be regarded as a learning machine without regularization and one with full regularization from the unlabeled data, respectively. Therefore, to supplement this framework of the regularization strength, it is necessary to introduce data-dependant partial regularization. To this end, we reformulate TSVM into a form with controllable regularization strength, which includes SVM and TSVM as special cases. Furthermore, we introduce a method of adaptive regularization that is data dependant and is based on the smoothness assumption.
Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another.
In the StackGAN paper, they introduce a color-consistency regularization term "to keep samples generated from the same input at different generators more consistent in color and thus to improve the quality of the generated images". Where mean and variance are of pixels of an image, and lambdas are constants. However, this term does not equal to zero when calculated with real data. Shrinking the original image with max pooling changes the mean, and average pooling changes the variance. TL;DR: Why is there a regularization term that doesn't equal zero?
We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.