Cycles and collusion in congestion games under Q-learning