An Analysis of Action-Value Temporal-Difference Methods That Learn State Values