Goto

Collaborating Authors

 regularizing


Regularizing Towards Permutation Invariance In Recurrent Models

Neural Information Processing Systems

In many machine learning problems the output should not depend on the order of the inputs. Such ``permutation invariant'' functions have been studied extensively recently. Here we argue that temporal architectures such as RNNs are highly relevant for such problems, despite the inherent dependence of RNNs on order. We show that RNNs can be regularized towards permutation invariance, and that this can result in compact models, as compared to non-recursive architectures. Existing solutions (e.g., DeepSets) mostly suggest restricting the learning problem to hypothesis classes which are permutation invariant by design. Our approach of enforcing permutation invariance via regularization gives rise to learning functions which are semi permutation invariant, e.g.


Regularizing by the Variance of the Activations' Sample-Variances

Neural Information Processing Systems

Normalization techniques play an important role in supporting efficient and often more effective training of deep neural networks. While conventional methods explicitly normalize the activations, we suggest to add a loss term instead. This new loss term encourages the variance of the activations to be stable and not vary from one random mini-batch to the next. As we prove, this encourages the activations to be distributed around a few distinct modes. We also show that if the inputs are from a mixture of two Gaussians, the new loss would either join the two together, or separate between them optimally in the LDA sense, depending on the prior probabilities. Finally, we are able to link the new regularization term to the batchnorm method, which provides it with a regularization perspective. Our experiments demonstrate an improvement in accuracy over the batchnorm technique for both CNNs and fully connected networks.


Regularizing Towards Permutation Invariance In Recurrent Models

Neural Information Processing Systems

In many machine learning problems the output should not depend on the order of the inputs. Such permutation invariant'' functions have been studied extensively recently. Here we argue that temporal architectures such as RNNs are highly relevant for such problems, despite the inherent dependence of RNNs on order. We show that RNNs can be regularized towards permutation invariance, and that this can result in compact models, as compared to non-recursive architectures. Existing solutions (e.g., DeepSets) mostly suggest restricting the learning problem to hypothesis classes which are permutation invariant by design.


Regularizing Towards Soft Equivariance Under Mixed Symmetries

Kim, Hyunsu, Lee, Hyungi, Yang, Hongseok, Lee, Juho

arXiv.org Artificial Intelligence

Datasets often have their intrinsic symmetries, and particular deep-learning models called equivariant or invariant models have been developed to exploit these symmetries. However, if some or all of these symmetries are only approximate, which frequently happens in practice, these models may be suboptimal due to the architectural restrictions imposed on them. We tackle this issue of approximate symmetries in a setup where symmetries are mixed, i.e., they are symmetries of not single but multiple different types and the degree of approximation varies across these types. Instead of proposing a new architectural restriction as in most of the previous approaches, we present a regularizer-based method for building a model for a dataset with mixed approximate symmetries. The key component of our method is what we call equivariance regularizer for a given type of symmetries, which measures how much a model is equivariant with respect to the symmetries of the type. Our method is trained with these regularizers, one per each symmetry type, and the strength of the regularizers is automatically tuned during training, leading to the discovery of the approximation levels of some candidate symmetry types without explicit supervision. Using synthetic function approximation and motion forecasting tasks, we demonstrate that our method achieves better accuracy than prior approaches while discovering the approximate symmetry levels correctly.


A general method for regularizing tensor decomposition methods via pseudo-data

Gottesman, Omer, Pan, Weiwei, Doshi-Velez, Finale

arXiv.org Machine Learning

Tensor decomposition methods (TDMs) have recently gained popularity as ways of performing inference for latent variable models [Anandkumar et al., 2014]. The interest in these methods is motivated by the fact that they come with theoretical global convergence guarantees in the limit of infinite data [Anandkumar et al., 2012, Arora et al., 2013]. However, a main limitation of these methods is that they lack natural methods for regularization or encouraging desired properties on the model parameters when the amount of data is limited. Previous works attempted to alleviate this drawback by modifying existing tensor decomposition methods to incorporate specific constraints, such as sparsity [Sun et al., 2015], or incorporate modeling assumptions, such as the existence of anchor words [Arora et al., 2013, Nguyen et al., 2014]. All of these works develop bespoke algorithms tailored to those constraints or assumptions. Furthermore, many of these methods impose hard constraints on the learned model, which may be detrimental as the size of the data grow--framed in the context of Bayesian intuition, when we have a lot of data, we want our methods to allow the evidence to overwhelm our priors. We introduce an alternative approach which can be applied to encourage any (differentiable) desired structure or properties on the model parameters, and which will only encourage this "prior" information when the data is insufficient. Specifically, we adopt the common view of Bayesian priors as representing "pseudo-observations" of artificial data which bias our learned model parameters towards our prior belief [Bishop, 2006]. We apply the tensor decomposition method of Anandkumar et al.