From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models