The accurate estimation of covariance matrices is essential for many signal processing and machine learning algorithms. In high dimensional settings the sample covariance is known to perform poorly, hence regularization strategies such as analytic shrinkage of Ledoit/Wolf are applied. In the standard setting, i.i.d. data is assumed, however, in practice, time series typically exhibit strong autocorrelation structure, which introduces a pronounced estimation bias. Recent work by Sancetta has extended the shrinkage framework beyond i.i.d. data. We contribute in this work by showing that the Sancetta estimator, while being consistent in the high-dimensional limit, suffers from a high bias in finite sample sizes. We propose an alternative estimator, which is (1) unbiased, (2) less sensitive to hyperparameter choice and (3) yields superior performance in simulations on toy data and on a real world data set from an EEG-based Brain-Computer-Interfacing experiment.
An experimental drug for the most severe forms of multiple sclerosis has slowed brain shrinkage by nearly a half. There are dozens of therapies approved for the relapsing form of MS, a disease of the nervous system, in which people can be symptom-free for months before another attack. But there are very few for people suffering from more severe forms of the disease – known as primary progressive and secondary progressive MS and in which there is rarely any respite from disabling symptoms.
A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern kernel methods that rely on embedding probability distributions in RKHSs. Given a finite sample, an empirical average has been used commonly as a standard estimator of the true kernel mean. Despite a widespread use of this estimator, we show that it can be improved thanks to the well-known Stein phenomenon. We propose a new family of estimators called kernel mean shrinkage estimators (KMSEs), which benefit from both theoretical justifications and good empirical performance. The results demonstrate that the proposed estimators outperform the standard one, especially in a "large d, small n" paradigm.
Such a representation is closely related to redundancy reductionand independent component analysis, and has some neurophysiological plausibility. In this paper, we show how sparse coding can be used for denoising. Using maximum likelihood estimation of nongaussian variables corrupted by gaussian noise, we show how to apply a shrinkage nonlinearity on the components of sparse coding so as to reduce noise. Furthermore, we show how to choose the optimal sparse coding basis for denoising. Our method is closely related to the method of wavelet shrinkage, but has the important benefit over wavelet methods that both the features and the shrinkage parameters are estimated directly from the data. 1 Introduction A fundamental problem in neural network research is to find a suitable representation forthe data.