Less is More: Accurate Speech Recognition & Translation without Web-Scale Data