Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models