Extraction and Summarization of Explicit Video Content using Multi-Modal Deep Learning