Transformer-Enabled Diachronic Analysis of Vedic Sanskrit: Neural Methods for Quantifying Types of Language Change
Hariharan, Ananth, Mortensen, David
–arXiv.org Artificial Intelligence
This study demonstrates how hybrid neural-symbolic methods can yield significant new insights into the evolution of a morphologically rich, low-resource language. We challenge the naive assumption that linguistic change is simplification by quantitatively analyzing over 2,000 years of Sanskrit, demonstrating how weakly-supervised hybrid methods can yield new insights into the evolution of morphologically rich, low-resource languages. Our approach addresses data scarcity through weak supervision, using 100+ high-precision regex patterns to generate pseudo-labels for fine-tuning a multilingual BERT. We then fuse symbolic and neural outputs via a novel confidence-weighted ensemble, creating a system that is both scalable and interpretable. Applying this framework to a 1.47-million-word diachronic corpus, our ensemble achieves a 52.4% overall feature detection rate. Our findings reveal that Sanskrit's overall morphological complexity does not decrease but is instead dynamically redistributed: while earlier verbal features show cyclical patterns of decline, complexity shifts to other domains, evidenced by a dramatic expansion in compounding and the emergence of new philosophical terminology. Critically, our system produces well-calibrated uncertainty estimates, with confidence strongly correlating with accuracy (Pearson r = 0.92) and low overall calibration error (ECE = 0.043), bolstering the reliability of these findings for computational philology.
arXiv.org Artificial Intelligence
Dec-8-2025
- Country:
- Asia
- China (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Nepal > Bagmati Province
- Kathmandu District > Kathmandu (0.04)
- Europe
- Austria > Vienna (0.14)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany
- Lower Saxony > Gottingen (0.04)
- Saxony > Leipzig (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- North America
- Canada (0.04)
- United States
- Illinois > Champaign County
- Urbana (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Illinois > Champaign County
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.67)
- Technology: