A Multi-lingual Dataset of Classified Paragraphs from Open Access Scientific Publications