Foreground Clustering for Joint Segmentation and Localization in Videos and Images