On The Global Convergence Of Online RLHF With Neural Parametrization