"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.
In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowledge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve considerably better results for the classification task. For a more coarse-grained classification using eight labels we achieve an F1- score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. We make the source code and trained models of our experiments publicly available.
Did you know J.K. Rowling was accused of stealing the word'muggles', The Da Vinci Code's author, Dan Brown, was sued for non-literal copyright infringement and that it was speculated that the Hitler Diaries of 1983 was written by Adolf Hitler himself while evidences supported otherwise? Many such'literary' quandaries are inspected by expert linguists as analysing and categorising discourses is fairly complex, domain-specific and highly multi-dimensional. One of latest research areas in Natural Language Processing is Authorship Analysis which is trying to leverage the computational power of big-data and artificial intelligence combined with linguistics and cognitive psychology to encode automatic classification of texts, identification of author profiles and resolution of authorship conflicts. This article is an attempt to introduce the concept of authorship analysis, its application areas and the major sub-tasks associated with it. The art and science of discriminating between writing styles of authors by identifying the characteristics of the persona of the authors and examining articles authored by them is called Authorship Analysis.
The art and science of discriminating between writing styles of authors by identifying the characteristics of the persona of the authors and examining articles authored by them is called Authorship Analysis. It aims to determine characteristics of an individual like age, gender, native language and personality traits based on "available information" pertaining to that individual. In this article, "available information" refers to textual data only in the context of authorship analysis, however, information in this context could go beyond textual format as it might also involve usage of multi-modal observations. Multi-modal observations capture characteristic features such as voice, intonation, gestures, body posture and other physical behavioral aspects of an individual. A combination of all these characteristics reflects the persona of an individual and consequently helps in profiling that individual.
Takagi-Sugeno-Kang (TSK) fuzzy systems are flexible and interpretable machine learning models; however, they may not be easily applicable to big data problems, especially when the size and the dimensionality of the data are both large. This paper proposes a mini-batch gradient descent (MBGD) based algorithm to efficiently and effectively train TSK fuzzy systems for big data classification problems. It integrates three novel techniques: 1) uniform regularization (UR), which is a regularization term added to the loss function to make sure the rules have similar average firing levels, and hence better generalization performance; 2) random percentile initialization (RPI), which initializes the membership function parameters efficiently and reliably; and, 3) batch normalization (BN), which extends BN from deep neural networks to TSK fuzzy systems to speedup the convergence and improve generalization. Experiments on nine datasets from various application domains, with varying size and feature dimensionality, demonstrated that each of UR, RPI and BN has its own unique advantages, and integrating all three together can achieve the best classification performance.
Yarbus' claim to decode the observer's task from eye movements has received mixed reactions. In this paper, we have supported the hypothesis that it is possible to decode the task. We conducted an exploratory analysis on the dataset by projecting features and data points into a scatter plot to visualize the nuance properties for each task. Following this analysis, we eliminated highly correlated features before training an SVM and Ada Boosting classifier to predict the tasks from this filtered eye movements data. We achieve an accuracy of 95.4% on this task classification problem and hence, support the hypothesis that task classification is possible from a user's eye movement data.
This work proposes the Bregman-Tweedie classification model and analyzes the domain structure of the extended exponential function, an extension of the classic generalized exponential function with additional scaling parameter, and related high-level mathematical structures, such as the Bregman-Tweedie loss function and the Bregman-Tweedie divergence. The base function of this divergence is the convex function of Legendre type induced from the extended exponential function. The Bregman-Tweedie loss function of the proposed classification model is the regular Legendre transformation of the Bregman-Tweedie divergence. This loss function is a polynomial parameterized function between unhinge loss and the logistic loss function. Actually, we have two sub-models of the Bregman-Tweedie classification model; H-Bregman with hinge-like loss function and L-Bregman with logisticlike loss function. Although the proposed classification model is nonconvex and unbounded, empirically, we have observed that the H-Bregman and L-Bregman outperform, in terms of the Friedman ranking, logistic regression and SVM and show reasonable performance in terms of the classification accuracy in the category of the binary linear classification problem. Keywords: Extended exponential function, convex function of Legendre type, Bregman-Tweedie divergence, Bregman-Tweedie classification model, hinge loss, logistic loss.
Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%).
In this post, I want to show how to apply BERT to a simple text classification problem. I assume that you're more or less familiar with what BERT is on a high level, and focus more on the practical side by showing you how to utilize it in your work. Roughly speaking, BERT is a model that knows to represent text. You give it some sequence as an input, it then looks left and right several times and produces a vector representation for each word as the output. In their paper, the authors describe two ways to work with BERT, one as with "feature extraction" mechanism.
I assume that you're more or less familiar with what BERT is on a high level, and focus more on the practical side by showing you how to utilize it in your work. Roughly speaking, BERT is a model that knows to represent text. You give it some sequence as an input, it then looks left and right several times and produces a vector representation for each word as the output. In their paper, the authors describe two ways to work with BERT, one as with "feature extraction" mechanism. That is, we use the final output of BERT as an input to another model.
We propose a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, we assign negative-oriented labels to the ambiguous training examples if they are unlikely fall into certain classes. We construct our new maximum likelihood estimators with self-correction property, and prove that under some conditions, our estimators converge faster. Also we discuss the advantages of applying one of our estimator to a fully supervised learning problem. The proposed method has potential applicability in many areas, such as crowdsourcing, natural language processing and medical image analysis.