Neural Machine Translation without Embeddings

Aug-21-2020–arXiv.org Machine Learning

Many NLP models follow the embed-contextualize-predict paradigm, in which each sequence token is represented as a dense vector via an embedding matrix, and fed into a contextualization component that aggregates the information from the entire sequence in order to make a prediction. Could NLP models work without the embedding component? To that end, we omit the input and output embeddings from a standard machine translation model, and represent text as a sequence of bytes via UTF-8 encoding, using a constant 256-dimension one-hot representation for each byte. Experiments on 10 language pairs show that removing the embedding matrix consistently improves the performance of byte-to-byte models, often outperforms character-to-character models, and sometimes even produces better translations than standard subword models.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Machine Learning

Aug-21-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
  - New South Wales > Sydney (0.04)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)
- Europe
  - Spain (0.04)
  - Czechia > Prague (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.05)
- Asia
  - Vietnam > Hanoi
    - Hanoi (0.04)
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found