Reviews: Kernel-Based Approaches for Sequence Modeling: Connections to Neural Methods
–Neural Information Processing Systems
Section 3 & 4 feels mostly independent of kernels, it reads as a train-of-thought discussion on how to motivate and build useful RNN architectures. It fails as a clear description of the actual proposed models, as it takes quite a bit of re-reading to detail and understand what the RKM-LSTM is, for example. For the language modelling experiments, I feel Table 3 should probably acknowledge what is state-of-the-art for WikiText-2 and PTB, as there's an insinuation that the proposed models are SOTA (which they are not). For example, [24] is cited however the numbers in their paper are actually better than what is presented in Table 3. E.g. for [24] PTB is 60.0 valid, 57.3 test; WT2 is 68.6 valid, 65.8 test. But there are more recent variants of the awd-lstm, e.g.
Neural Information Processing Systems
Jan-25-2025, 01:29:49 GMT
- Technology: