Multi-Modal Video Feature Extraction for Popularity Prediction