Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control