Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit
–arXiv.org Artificial Intelligence
The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.
arXiv.org Artificial Intelligence
Aug-17-2023
- Country:
- South America
- Oceania > Australia
- Victoria > Melbourne (0.04)
- Australian Capital Territory > Canberra (0.04)
- North America
- Dominican Republic (0.04)
- Barbados (0.04)
- United States
- Maryland > Baltimore (0.04)
- District of Columbia > Washington (0.04)
- Texas > Travis County
- Austin (0.27)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Ohio
- Montgomery County > Dayton (0.04)
- Franklin County > Columbus (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Colorado
- Denver County > Denver (0.04)
- Boulder County > Boulder (0.04)
- Massachusetts
- Suffolk County > Boston (0.13)
- Hampshire County > Amherst (0.04)
- Middlesex County
- California
- Alameda County > Berkeley (0.13)
- San Mateo County > Menlo Park (0.04)
- San Diego County > San Diego (0.04)
- Los Angeles County > Long Beach (0.04)
- Santa Cruz County > Santa Cruz (0.04)
- Santa Clara County
- New York > New York County
- New York City (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.14)
- Europe
- Czechia > Prague (0.04)
- Ukraine (0.04)
- Germany
- Berlin (0.04)
- North Rhine-Westphalia > Cologne Region
- Bonn (0.04)
- Spain
- Denmark > Capital Region
- Copenhagen (0.04)
- Finland
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.13)
- Greater London > London (0.04)
- Greece > Attica
- Athens (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Hauts-de-France > Nord
- Lille (0.04)
- Grand Est > Meurthe-et-Moselle
- Nancy (0.04)
- Île-de-France > Paris
- Italy
- Lombardy > Milan (0.04)
- Tuscany
- Florence (0.04)
- Pisa Province > Pisa (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Sweden
- Uppsala County > Uppsala (0.04)
- Vaestra Goetaland > Gothenburg (0.04)
- Östergötland County > Linköping (0.04)
- Stockholm > Stockholm (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Poland > Lesser Poland Province
- Kraków (0.04)
- Portugal
- Asia
- South Korea (0.04)
- East Asia (0.04)
- Indonesia > Bali (0.04)
- India
- Maharashtra > Mumbai (0.04)
- Telangana > Hyderabad (0.04)
- NCT > New Delhi (0.04)
- West Bengal
- Thailand > Chiang Mai
- Chiang Mai (0.04)
- China
- Hong Kong (0.04)
- Beijing > Beijing (0.04)
- Sichuan Province > Chengdu (0.04)
- Middle East
- Japan
- Kyūshū & Okinawa > Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Honshū
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.13)
- Kansai > Osaka Prefecture
- Osaka (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Kyūshū & Okinawa > Kyūshū
- Singapore > Central Region
- Singapore (0.04)
- Genre:
- Overview (1.00)
- Instructional Material (1.00)
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Promising Solution (0.92)
- Industry:
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.46)
- Information Technology > Services (0.45)
- Government > Regional Government
- Technology: