Self-Refinement of Language Models from External Proxy Metrics Feedback