Logistic $Q$-Learning