Convex Q Learning in a Stochastic Environment: Extended Version