Are We Scaling the Right Thing? A System Perspective on Test-Time Scaling