stationary activation
Author response: 'Stationary Activations for Uncertainty Calibration in Deep Learning ' NeurIPS: # 5154
We thank the anonymous reviewers for their enthusiasm and detailed comments on the manuscript. We start by addressing R2's concerns as they had the lowest score. This link is explicit as shown in this paper. The choice of kernel/activation function is up to the modelling task and'expert We merely provide a building block. The Matérn is a widely used prior, and worth adding to the NN tool set.
Review for NeurIPS paper: Stationary Activations for Uncertainty Calibration in Deep Learning
The overall motivation is not clear. It is true that Matern kernels are good at capturing sharp transitions, as shown in Fig 1. There are many other methods to achieve similar, if not better, results. For instance, we can learn kernels [1,2], use deep kernels [2], use spectral mixture kernels [3], use "neural-network kernels" [4], etc. Comparisons with [1]-[5] would provide further insights. Please also report MSE and AUC in addition to the accuracy.
Review for NeurIPS paper: Stationary Activations for Uncertainty Calibration in Deep Learning
The authors proposes activation functions derived from stationary Matern family kernels which is widely used in Gaussian Process and can approximate uncertainty. Reviewers found that the paper to be well motivated and clearly described in the context of previous related work. This could be further improved by expanding the discussion to other GP kernels similar to Matern and the reason for the specific choice of Matern in that larger context. The empirical results were adequate but could be improved. The Fig 1 and 2 need further elaboration and analysis to explain the anomalies pointed out by Reviewer #4.
Stationary Activations for Uncertainty Calibration in Deep Learning
We introduce a new family of non-linear neural network activation functions that mimic the properties induced by the widely-used Mat\'ern family of kernels in Gaussian process (GP) models. We show an explicit link to the corresponding GP models in the case that the network consists of one infinitely wide hidden layer. In the limit of infinite smoothness the Mat\'ern family results in the RBF kernel, and in this case we recover RBF activations. Mat\'ern activation functions result in similar appealing properties to their counterparts in GP models, and we demonstrate that the local stationarity property together with limited mean-square differentiability shows both good performance and uncertainty calibration in Bayesian deep learning tasks. In particular, local stationarity helps calibrate out-of-distribution (OOD) uncertainty.