The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Sharma, Pratyusha, Ash, Jordan T., Misra, Dipendra
–arXiv.org Artificial Intelligence
Since their original release, Transformer-based LLMs have been shown to be remarkably proficient on a wide array of important machine learning tasks. Their underlying Transformer architecture has become state-of-the-art for modeling and reasoning about natural language, and has shown promise in domains such as computer vision [Dosovitskiy et al., 2020] and reinforcement learning [Chen et al., 2021] as well. Contemporary instantiations of Transformer architectures are infamously large, typically requiring tremendous compute resources for both training and inference. This is by design, as Transformers trained with more parameters or data are demonstrably more capable than their slimmer predecessors--often by a significant margin [Brown et al., 2020, Touvron et al., 2023]. Still, a growing body of work suggests that Transformerbased models, and neural networks more generally, do not require all fitted parameters to retain their learned hypotheses.
arXiv.org Artificial Intelligence
Dec-20-2023
- Country:
- Asia (0.93)
- Europe (0.93)
- North America (0.67)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine (0.93)
- Leisure & Entertainment (0.67)
- Technology: