Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication
Ahmad, Arif, Khyathi, Mothika Gayathri, Bhattacharyya, Pushpak
–arXiv.org Artificial Intelligence
Reduplication and repetition, though similar in form, serve distinct linguistic purposes. Reduplication is a deliberate morphological process used to express grammatical, semantic, or pragmatic nuances, while repetition is often unintentional and indicative of disfluency. This paper presents the first large-scale study of reduplication and repetition in speech using computational linguistics. We introduce IndicRedRep, a new publicly available dataset containing Hindi, Telugu, and Marathi text annotated with reduplication and repetition at the word level. We evaluate transformer-based models for multi-class reduplication and repetition token classification, utilizing the Reparandum-Interregnum-Repair structure to distinguish between the two phenomena. Our models achieve macro F1 scores of up to 85.62% in Hindi, 83.95% in Telugu, and 84.82% in Marathi for reduplication-repetition classification.
arXiv.org Artificial Intelligence
Jul-10-2024
- Country:
- North America
- Canada (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- Texas > Travis County
- Austin (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Colorado > Denver County
- Denver (0.04)
- California > Santa Barbara County
- Santa Barbara (0.04)
- Washington > King County
- Mexico > Mexico City
- Mexico City (0.04)
- Europe
- Belgium (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Asia
- India > Jharkhand (0.04)
- Middle East > Jordan (0.04)
- Indonesia > Bali (0.04)
- Thailand > Chiang Mai
- Chiang Mai (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- China > Beijing
- Beijing (0.04)
- North America
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine (0.46)
- Technology: