Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers