Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks