GiesKaNe: Bridging Past and Present in Grammatical Theory and Practical Application
–arXiv.org Artificial Intelligence
This article explores the requirements for corpus compilation within the GiesKaNe project (University of Giessen and Kassel, Syntactic Basic Structures of New High German). The project is defined by three central characteristics: it is a reference corpus, a historical corpus, and a syntactically deeply annotated treebank. As a historical corpus, GiesKaNe aims to establish connections with both historical and contemporary corpora, ensuring its relevance across temporal and linguistic contexts. The compilation process strikes the balance between innovation and adherence to standards, addressing both internal project goals and the broader interests of the research community. The methodological complexity of such a project is managed through a complementary interplay of human expertise and machine-assisted processes. The article discusses foundational topics such as tokenization, normalization, sentence definition, tagging, parsing, and inter-annotator agreement, alongside advanced considerations. These include comparisons between grammatical models, annotation schemas, and established de facto annotation standards as well as the integration of human and machine collaboration. Notably, a novel method for machine-assisted classification of texts along the continuum of conceptual orality and literacy is proposed, offering new perspectives on text selection. Furthermore, the article introduces an approach to deriving de facto standard annotations from existing ones, mediating between standardization and innovation. In the course of describing the workflow the article demonstrates that even ambitious projects like GiesKaNe can be effectively implemented using existing research infrastructure, requiring no specialized annotation tools. Instead, it is shown that the workflow can be based on the strategic use of a simple spreadsheet and integrates the capabilities of the existing infrastructure.
arXiv.org Artificial Intelligence
Feb-7-2025
- Country:
- Africa > Middle East
- Morocco (0.04)
- Asia
- China > Beijing
- Beijing (0.04)
- Japan
- Honshū > Kansai
- Osaka Prefecture > Osaka (0.04)
- Kyūshū & Okinawa > Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Honshū > Kansai
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Russia (0.04)
- Singapore (0.04)
- China > Beijing
- Europe
- Czechia > Prague (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Sweden
- Kronoberg County > Växjö (0.04)
- Vaestra Goetaland > Gothenburg (0.04)
- Östergötland County > Linköping (0.04)
- Norway > Eastern Norway
- Oslo (0.04)
- Romania > Nord-Est Development Region
- Iași County > Iași (0.04)
- Middle East
- Malta (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Russia (0.04)
- Italy
- Piedmont > Turin Province
- Turin (0.04)
- Tuscany > Florence (0.04)
- Piedmont > Turin Province
- France
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Île-de-France > Paris
- Paris (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Slovenia (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Greece > Attica
- Athens (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Greater Manchester > Manchester (0.04)
- West Midlands > Birmingham (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Netherlands
- North Holland > Amsterdam (0.04)
- South Holland > Dordrecht (0.04)
- Spain
- Germany
- Baden-Württemberg
- Karlsruhe Region > Heidelberg (0.04)
- Stuttgart Region > Stuttgart (0.04)
- Tübingen Region > Tübingen (0.14)
- Berlin (0.04)
- Brandenburg > Potsdam (0.04)
- Hesse > Darmstadt Region
- Wiesbaden (0.04)
- North Rhine-Westphalia > Cologne Region
- Cologne (0.04)
- Saxony > Leipzig (0.04)
- Baden-Württemberg
- Iceland > Capital Region
- Reykjavik (0.04)
- Poland > Greater Poland Province
- Poznań (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- North America
- Canada (0.04)
- Dominican Republic (0.04)
- United States
- California
- Los Angeles County > Los Angeles (0.14)
- Santa Clara County > Palo Alto (0.04)
- Ventura County > Thousand Oaks (0.04)
- District of Columbia > Washington (0.04)
- Indiana (0.04)
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- California
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Victoria > Melbourne (0.04)
- Africa > Middle East
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine (0.67)
- Technology: