Cost-optimal Sequential Testing via Doubly Robust Q-learning