Q-Learning for Continuous Actions with Cross-Entropy Guided Policies