A Transformer Model for Segmentation, Classification, and Caller Identification of Marmoset Vocalization