Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision