Csató, Lehel
Pruning CNN's with linear filter ensembles
Sándor, Csanád, Pável, Szabolcs, Csató, Lehel
Despite the promising results of convolutional neural networks (CNNs), applying them on resource limited devices is still a challenge, mainly due to the huge memory and computation requirements. To tackle these problems, pruning can be applied to reduce the network size and number of floating point operations (FLOPs). Contrary to the \emph{filter norm} method -- that is used in network pruning and uses the assumption that the smaller this norm, the less important is the associated component --, we develop a novel filter importance norm that incorporates the loss caused by the elimination of a component from the CNN. To estimate the importance of a set of architectural components, we measure the CNN performance as different components are removed. The result is a collection of filter ensembles -- filter masks -- and associated performance values. We rank the filters based on a linear and additive model and remove the least important ones such that the drop in network accuracy is minimal. We evaluate our method on a fully connected network, as well as on the ResNet architecture trained on the CIFAR-10 data-set. Using our pruning method, we managed to remove $60\%$ of the parameters and $64\%$ of the FLOPs from the ResNet with an accuracy drop of less than $0.6\%$.
TAP Gibbs Free Energy, Belief Propagation and Sparsity
Csató, Lehel, Opper, Manfred, Winther, Ole
The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka's expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
TAP Gibbs Free Energy, Belief Propagation and Sparsity
Csató, Lehel, Opper, Manfred, Winther, Ole
The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka's expectation propagation. Lastly,we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification anddensity estimation with Gaussian processes and on an independent componentanalysis problem.
Sparse Representation for Gaussian Process Models
Csató, Lehel, Opper, Manfred
We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.
Efficient Approaches to Gaussian Process Classification
Csató, Lehel, Fokoué, Ernest, Opper, Manfred, Schottky, Bernhard, Winther, Ole
The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression andclassification [2, 1].
Efficient Approaches to Gaussian Process Classification
Csató, Lehel, Fokoué, Ernest, Opper, Manfred, Schottky, Bernhard, Winther, Ole
The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression and classification [2, 1].