An Analysis of Word2Vec for the Italian Language

Di Gennaro, Giovanni, Buonanno, Amedeo, Di Girolamo, Antonio, Ospedale, Armando, Palmieri, Francesco A. N., Fedele, Gianfranco

arXiv.org Machine Learning 

Word representation is fundamental in NLP tasks, because it is precisely from the coding of semantic closeness between words that it is possible to think of teaching a machine to understand text. Despite the spread of word embedding concepts, still few are the achievements in linguistic contexts other than English. In this work, analysing the semantic capacity of the Word2Vec algorithm, an embedding for the Italian language is produced. Parameter setting such as the number of epochs, the size of the context window and the number of negatively backpropagated samples is explored. Keywords: Word2Vec, Word Embedding, NLP 1 Introduction In order to make human language comprehensible to a computer, it is obviously essential to provide some word encoding. The simplest approach is the one-hot encoding, where each word is represented by a sparse vector with dimension equal to the vocabulary size.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found