A Unified Causal View of Instruction Tuning
Chen, Lu, Huang, Wei, Zhang, Ruqing, Chen, Wei, Guo, Jiafeng, Cheng, Xueqi
–arXiv.org Artificial Intelligence
Instruction tuning on a mixture of tasks has improved zero-shot capabilities in natural language processing (NLP). Nevertheless, existing methods often learn features that exhibit correlations between instruction-formatted samples and target labels, rather than causal relationships. Termed as ``spurious correlation'' in statistics, such a correlation may change drastically in a new task, making the effect from the learned features to be misleading. To this end, we develop a meta Structural Causal Model (meta-SCM) to integrate different NLP tasks under a single causal structure of the data. Specifically, the meta-SCM introduces multiple latent factors that represent properties of source context, only some of which causally influence the target labels for a specific task. The key idea is to learn task-required causal factors and only use those to make predictions for a given task. Theoretically, we prove the causal factor can be identified without mixing information from others. Guided by the identifiability, we propose a Structural Instruction Tuning (SIT) method to learn the task-required causal representations that can mimic the causal factors for each task. The utility of our approach is verified by improvements of zero-shot ability on a range of unseen datasets and tasks.
arXiv.org Artificial Intelligence
Feb-9-2024
- Country:
- Europe (0.67)
- North America > United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Seattle (0.14)
- Minnesota > Hennepin County
- Genre:
- Research Report (0.64)
- Technology: