Reviews: Attention is All you Need

Oct-7-2024, 19:15:21 GMT–Neural Information Processing Systems

The paper presents a new architecture for encoder/decoder models for sequence-to-sequence modeling that is solely based on (multi-layered) attention networks combined with standard Feed-Forward networks as opposed to the common scheme of using recurrent or convolutional neural networks. The paper presents two main advantages of this new architecture: (1) Reduced training time due to reduced complexity of the architecture, and (2) new State-of-the-Art result on standard WMT data sets, outperforming previous work by about 1 BLEU point. Strengths: - The paper argues well that (1) can be achieved by avoiding recurrent or convolutional layers and the complexity analysis in Table 1 strengthens the argument. The main strengths of the paper are that it proposes an entirely novel architecture without recurrence or convolutions, and advances state of the art. Weaknesses: - While the general architecture of the model is described well and is illustrated by figures, architectural details lack mathematical definition, for example multi-head attention.

architecture, attention, new architecture, (1 more...)

Neural Information Processing Systems

Oct-7-2024, 19:15:21 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.39)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)