Learning to Describe Video with Weak Supervision by Exploiting Negative Sentential Information