An Analysis of Word2Vec for the Italian Language

Di Gennaro, Giovanni, Buonanno, Amedeo, Di Girolamo, Antonio, Ospedale, Armando, Palmieri, Francesco A. N., Fedele, Gianfranco

Jan-25-2020–arXiv.org Machine Learning

Word representation is fundamental in NLP tasks, because it is precisely from the coding of semantic closeness between words that it is possible to think of teaching a machine to understand text. Despite the spread of word embedding concepts, still few are the achievements in linguistic contexts other than English. In this work, analysing the semantic capacity of the Word2Vec algorithm, an embedding for the Italian language is produced. Parameter setting such as the number of epochs, the size of the context window and the number of negatively backpropagated samples is explored. Keywords: Word2Vec, Word Embedding, NLP 1 Introduction In order to make human language comprehensible to a computer, it is obviously essential to provide some word encoding. The simplest approach is the one-hot encoding, where each word is represented by a sparse vector with dimension equal to the vocabulary size.

epoch, representation, word2vec, (14 more...)

arXiv.org Machine Learning

Jan-25-2020

arXiv.org PDF

Add feedback

Country:
- Europe
  - Italy > Campania (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Natural Language > Text Processing (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found