Incorporating semantic features from the WordNet lexical database is among one of the many approaches that have been tried to improve the predictive performance of text classification models. The intuition behind this is that keywords in the training set alone may not be extensive enough to enable generation of a universal model for a category, but if we incorporate the word relationships in WordNet, a more accurate model may be possible. Other researchers have previously evaluated the effectiveness of incorporating WordNet synonyms, hypernyms, and hyponyms into text classification models. Generally, they have found that improvements in accuracy using features derived from these relationships are dependent upon the nature of the text corpora from which the document collections are extracted. In this paper, we not only reconsider the role of WordNet synonyms, hypernyms, and hyponyms in text classification models, we also consider the role of WordNet meronyms and holonyms. Incorporating these WordNet relationships into a Coordinate Matching classifier, a Naive Bayes classifier, and a Support Vector Machine classifier, we evaluate our approach on six document collections extracted from the Reuters-21578, USENET, and Digi-Trad text corpora. Experimental results show that none of the WordNet relationships were effective at increasing the accuracy of the Naive Bayes classifier. Synonyms, hypernyms, and holonyms were effective at increasing the accuracy of the Coordinate Matching classifier, and hypernyms were effective at increasing the accuracy of the SVM classifier.
Conversations in social media often contain the use of irony or sarcasm, when the users say the opposite of what they really mean. Irony markers are the meta-communicative clues that inform the reader that an utterance is ironic. We propose a thorough analysis of theoretically grounded irony markers in two social media platforms: Twitter and Reddit. Classification and frequency analysis shows that for Twitter, typographic markers such as emoticons and emojis are the most discriminative markers to recognize ironic utterances, while for Reddit the morphological markers (e.g., interjections, tag questions) are the most discriminative.
Mining opinions and sentiment from social networking sites is a popular application for social media systems. Common approaches use a machine learning system with a bag of words feature set. We present Delta TFIDF, an intuitive general purpose technique to efficiently weight word scores before classification. Delta TFIDF is easy to compute, implement, and understand. We use Support Vector Machines to show that Delta TFIDF significantly improves accuracy for sentiment analysis problems using three well known data sets.
A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphora resolution classifier with probabilistic outputs achieved almost four-fold improvement in accuracy over the baseline method.