Content-Dependent Versus Content-Independent Features for Gender and Age Range Identification in Different Types of Texts
Kurdi, M. Zakaria (University of Lynchburg)
This paper is about the comparison of content-dependent and content-independent features for the identification of short texts author’s age range and gender. Eight content-dependent features based on profiles of ngrams of words are used. In addition, ninety-eight content-independent features covering all the linguistic aspects of texts from phonology to discourse are used. These features were extracted from three corpora of different sizes and types. Were also conducted some experiments using four different machine learning algorithms combined with these features. The results show that content-dependent features do a better job for gender identification on the three corpora. However, content-independent features did better with the task of age range identification.
May-15-2019
- Country:
- North America
- United States
- South Carolina > Charleston County
- Charleston (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- South Carolina > Charleston County
- Mexico > Quintana Roo
- Cancún (0.04)
- United States
- Europe
- Slovenia (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- North America
- Genre:
- Research Report > New Finding (0.88)
- Technology: