The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Sharma, Pratyusha, Ash, Jordan T., Misra, Dipendra

Dec-20-2023–arXiv.org Artificial Intelligence

Since their original release, Transformer-based LLMs have been shown to be remarkably proficient on a wide array of important machine learning tasks. Their underlying Transformer architecture has become state-of-the-art for modeling and reasoning about natural language, and has shown promise in domains such as computer vision [Dosovitskiy et al., 2020] and reinforcement learning [Chen et al., 2021] as well. Contemporary instantiations of Transformer architectures are infamously large, typically requiring tremendous compute resources for both training and inference. This is by design, as Transformers trained with more parameters or data are demonstrably more capable than their slimmer predecessors--often by a significant margin [Brown et al., 2020, Touvron et al., 2023]. Still, a growing body of work suggests that Transformerbased models, and neural networks more generally, do not require all fitted parameters to retain their learned hypotheses.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Dec-20-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.93)
- Europe (0.93)
- North America (0.67)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (0.93)
- Leisure & Entertainment (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)