On Compression Principle and Bayesian Optimization for Neural Networks