Bridging the Data Provenance Gap Across Text, Speech and Video