Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers

Neural Information Processing Systems 

We consider the problem of estimating a function defined over n locations on a d -dimensional grid (having all side lengths equal to n {1/d}). When the function is constrained to have discrete total variation bounded by C_n, we derive the minimax optimal (squared) \ell_2 estimation error rate, parametrized by n, C_n . Total variation denoising, also known as the fused lasso, is seen to be rate optimal. Several simpler estimators exist, such as Laplacian smoothing and Laplacian eigenmaps. A natural question is: can these simpler estimators perform just as well?