Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge

Neural Information Processing Systems 

To unveil the hidden mechanisms under LM's incapability to learn long-tail domain-specific knowledge, we investigate the behaviors of a GPT on the Wikipedia dataset.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found