Solving Bayesian inverse problems with diffusion priors and off-policy RL