Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck Sham Kakade