corpus-based approach
Moly\'e: A Corpus-based Approach to Language Contact in Colonial France
Dent, Rasul, Janès, Juliette, Clérice, Thibault, Suarez, Pedro Ortiz, Sagot, Benoît
Whether or not several Creole languages which developed during the early modern period can be considered genetic descendants of European languages has been the subject of intense debate. This is in large part due to the absence of evidence of intermediate forms. This work introduces a new open corpus, the Moly\'e corpus, which combines stereotypical representations of three kinds of language variation in Europe with early attestations of French-based Creole languages across a period of 400 years. It is intended to facilitate future research on the continuity between contact situations in Europe and Creolophone (former) colonies.
- Africa > Mauritius (0.05)
- South America > French Guiana (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (17 more...)
Domain Bridge: Generative model-based domain forensic for black-box models
Zhang, Jiyi, Fang, Han, Chang, Ee-Chien
In forensic investigations of machine learning models, techniques that determine a model's data domain play an essential role, with prior work relying on large-scale corpora like ImageNet to approximate the target model's domain. Although such methods are effective in finding broad domains, they often struggle in identifying finer-grained classes within those domains. In this paper, we introduce an enhanced approach to determine not just the general data domain (e.g., human face) but also its specific attributes (e.g., wearing glasses). Our approach uses an image embedding model as the encoder and a generative model as the decoder. Beginning with a coarse-grained description, the decoder generates a set of images, which are then presented to the unknown target model. Successful classifications by the model guide the encoder to refine the description, which in turn, are used to produce a more specific set of images in the subsequent iteration. This iterative refinement narrows down the exact class of interest. A key strength of our approach lies in leveraging the expansive dataset, LAION-5B, on which the generative model Stable Diffusion is trained. This enlarges our search space beyond traditional corpora, such as ImageNet. Empirical results showcase our method's performance in identifying specific attributes of a model's input domain, paving the way for more detailed forensic analyses of deep learning models.
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Transportation > Air (0.65)
Harmonic Navigator: A Gesture-Driven, Corpus-Based Approach to Music Analysis, Composition, and Performance
Manaris, Bill (College of Charleston) | Johnson, David (College of Charleston) | Vassilandonakis, Yiorgos (College of Charleston)
We present a novel, real-time system for exploring harmonic spaces of musical styles, to generate music in collaboration with human performers utilizing gesture devices (such as the Kinect) together with MIDI and OSC instruments / controllers. This corpus-based environment incorporates statistical and evolutionary components for exploring potential flows through harmonic spaces, utilizing power-law (Zipf-based) metrics for fitness evaluation. It supports visual exploration and navigation of harmonic transition probabilities through interactive gesture control. These probabilities are computed from musical corpora (in MIDI format). Herein we utilize the Classical Music Archives 14,000+ MIDI corpus, among others. The user interface supports real-time exploration of the balance between predictability and surprise for musical composition and performance, and may be used in a variety of musical contexts and applications.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Corpus-Based Approaches to Semantic Interpretation in NLP
In recent years, there has been a flurry of research into empirical, corpus-based learning approaches to natural language processing (NLP). The success of these approaches has stimulated research in using empirical learning techniques in other facets of NLP, including semantic analysis -- uncovering the meaning of an utterance. This article is an introduction to some of the emerging research in the application of corpus-based learning techniques to problems in semantic interpretation. In particular, we focus on two important problems in semantic interpretation, namely, word-sense disambiguation and semantic parsing.