Textless Direct Speech-to-Speech Translation with Discrete Speech Representation