A flexible model for training action localization with varying levels of supervision