Goto

Collaborating Authors

 reformer


Fast Transformers with Clustered Attention Supplementary Material

Neural Information Processing Systems

Figure 1: Flow-chart demonstrating the compuation for clustered attention. For more details refer to 1.1 or 3.2 in the main paper. Work done at Idiap 34th Conference on Neural Information Processing Systems (NeurIPS 2020), V ancouver, Canada. We then present the flow chart demonstrating the same. This is followed by taking the weighted average of the 3 correponding values.


encouraged that reviewers find our paper clear and well written (R1, R2, R3) and our method to be theoretically sound

Neural Information Processing Systems

We would like to thank the reviewers for their helpful comments and their thorough evaluation of our work. Reversible layers is a technique introduced by Gomez et al. (2017) and is orthogonal and In contrast, clustered attention places no such restriction. We will also add Set Transformers to the related work section. Is speech favorable to clustering? We would like to mention that our NLP approximation experiment for GLUE and SQuAD tasks in 4.3 shows that NLP/vision tasks in the long context setting, as suggested.




47d40767c7e9df50249ebfd9c7cfff77-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their valuable comments! Unclear if the proposed method is better than only using LSH. Thank you for the suggestions. ALSH significantly outperforms the E2LSH and the Reformer LSH scheme. SMYRF-BERT base (see also Table 2).


Mamba Outpaces Reformer in Stock Prediction with Sentiments from Top Ten LLMs

Kadiyala, Lokesh Antony, Mirzaeinia, Amir

arXiv.org Artificial Intelligence

The stock market is extremely difficult to predict in the short term due to high market volatility, changes caused by news, and the non-linear nature of the financial time series. This research proposes a novel framework for improving minute-level prediction accuracy using semantic sentiment scores from top ten different large language models (LLMs) combined with minute interval intraday stock price data. We systematically constructed a time-aligned dataset of AAPL news articles and 1-minute Apple Inc. (AAPL) stock prices for the dates of April 4 to May 2, 2025. The sentiment analysis was achieved using the DeepSeek-V3, GPT variants, LLaMA, Claude, Gemini, Qwen, and Mistral models through their APIs. Each article obtained sentiment scores from all ten LLMs, which were scaled to a [0, 1] range and combined with prices and technical indicators like RSI, ROC, and Bollinger Band Width. Two state-of-the-art such as Reformer and Mamba were trained separately on the dataset using the sentiment scores produced by each LLM as input. Hyper parameters were optimized by means of Optuna and were evaluated through a 3-day evaluation period. Reformer had mean squared error (MSE) or the evaluation metrics, and it should be noted that Mamba performed not only faster but also better than Reformer for every LLM across the 10 LLMs tested. Mamba performed best with LLaMA 3.3--70B, with the lowest error of 0.137. While Reformer could capture broader trends within the data, the model appeared to over smooth sudden changes by the LLMs. This study highlights the potential of integrating LLM-based semantic analysis paired with efficient temporal modeling to enhance real-time financial forecasting.


47d40767c7e9df50249ebfd9c7cfff77-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their valuable comments! Unclear if the proposed method is better than only using LSH. Thank you for the suggestions. ALSH significantly outperforms the E2LSH and the Reformer LSH scheme. SMYRF-BERT base (see also Table 2).


Fast Transformers with Clustered Attention Supplementary Material

Neural Information Processing Systems

Figure 1: Flow-chart demonstrating the compuation for clustered attention. For more details refer to 1.1 or 3.2 in the main paper. Work done at Idiap 34th Conference on Neural Information Processing Systems (NeurIPS 2020), V ancouver, Canada. We then present the flow chart demonstrating the same. This is followed by taking the weighted average of the 3 correponding values.


encouraged that reviewers find our paper clear and well written (R1, R2, R3) and our method to be theoretically sound

Neural Information Processing Systems

We would like to thank the reviewers for their helpful comments and their thorough evaluation of our work. Reversible layers is a technique introduced by Gomez et al. (2017) and is orthogonal and In contrast, clustered attention places no such restriction. We will also add Set Transformers to the related work section. Is speech favorable to clustering? We would like to mention that our NLP approximation experiment for GLUE and SQuAD tasks in 4.3 shows that NLP/vision tasks in the long context setting, as suggested.