Neural Machine Translation without Embeddings
Many NLP models follow the embed-contextualize-predict paradigm, in which each sequence token is represented as a dense vector via an embedding matrix, and fed into a contextualization component that aggregates the information from the entire sequence in order to make a prediction. Could NLP models work without the embedding component? To that end, we omit the input and output embeddings from a standard machine translation model, and represent text as a sequence of bytes via UTF-8 encoding, using a constant 256-dimension one-hot representation for each byte. Experiments on 10 language pairs show that removing the embedding matrix consistently improves the performance of byte-to-byte models, often outperforms character-to-character models, and sometimes even produces better translations than standard subword models.
Aug-21-2020
- Country:
- Oceania > Australia
- Victoria > Melbourne (0.04)
- New South Wales > Sydney (0.04)
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.14)
- Europe
- Asia
- Vietnam > Hanoi
- Hanoi (0.04)
- Middle East > Israel
- Tel Aviv District > Tel Aviv (0.04)
- Vietnam > Hanoi
- Oceania > Australia
- Genre:
- Research Report (0.83)
- Technology: