Logarithmic Smoothing for Adaptive PAC-Bayesian Off-Policy Learning