Large language models can learn and generalize steganographic chain-of-thought under process supervision