Learning a Continue-Thinking Token for Enhanced Test-Time Scaling