Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences