Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models

Jun-18-2026, 09:19:18 GMT–Neural Information Processing Systems

For an LLM to correctly respond to an instruction it must understand both the semantics and the domain (i.e., subject area) of a given task-instruction pair. However, syntax can also convey implicit information. Recent work shows that syntactic templates--frequent sequences of Part-of-Speech (PoS) tags--are prevalent in training data and often appear in model outputs. In this work we characterize syntactic templates, domain, and semantics in task-instruction pairs. We identify cases of spurious correlations between syntax and domain, where models learn to associate a domain with syntax during training; this can sometimes override prompt semantics.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-18-2026, 09:19:18 GMT

Conferences PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Massachusetts (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Banking & Finance > Insurance (0.93)
- Information Technology > Security & Privacy (0.67)
- Materials > Chemicals
  - Industrial Gases (0.93)
- Health & Medicine
  - Therapeutic Area (1.00)
  - Health Care Providers & Services (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found