A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference