On The Global Convergence Of Online RLHF With Neural Parametrization

Open in new window