Smaller, Faster, Cheaper: Architectural Designs for Efficient Machine Learning