From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs