An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning