AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Concept Labeling: Building Text Classifiers with Minimal Supervision

Chenthamarakshan, Vijil (IBM T J Watson Research Center Yorktown Heights) | Melville, Prem (IBM T J Watson Research Center Yorktown Heights) | Sindhwani, Vikas (IBM T J Watson Research Center Yorktown Heights) | Lawrence, Richard D (IBM T J Watson Research Center Yorktown Heights)

AAAI ConferencesJul-19-2011

The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.

category, classifier, ontology, (15 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia (0.04)
Africa (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)
(3 more...)

Add feedback

TweetTrader.net: Leveraging Crowd Wisdom in a Stock Microblogging Forum

Sprenger, Timm Oliver (Technische Universität München)

AAAI ConferencesJul-12-2011

TweetTrader.net is a stock microblogging forum that leverages the wisdom of crowds to aggregate the information contained in stock-related tweets. Based on insights from academic research on stock microblogs, the application integrates inputs from text classification, user voting and a proprietary Stock Game in order to extract the sentiment (i.e., the bullishness) of online investors with respect to all publicly traded companies of the S&P 500.

machine learning, natural language, text classification, (18 more...)

AAAI Conferences

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Indiana (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)

Genre: Research Report (0.48)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.37)

Add feedback

Cause Identification from Aviation Safety Incident Reports via Weakly Supervised Semantic Lexicon Construction

Abedin, M. A., Ng, V., Khan, L.

Journal of Artificial Intelligence ResearchAug-26-2010

The Aviation Safety Reporting System collects voluntarily submitted reports on aviation safety incidents to facilitate research work aiming to reduce such incidents. To effectively reduce these incidents, it is vital to accurately identify why these incidents occurred. More precisely, given a set of possible causes, or shaping factors, this task of cause identification involves identifying all and only those shaping factors that are responsible for the incidents described in a report. We investigate two approaches to cause identification. Both approaches exploit information provided by a semantic lexicon, which is automatically constructed via Thelen and Riloff's Basilisk framework augmented with our linguistic and algorithmic modifications. The first approach labels a report using a simple heuristic, which looks for the words and phrases acquired during the semantic lexicon learning process in the report. The second approach recasts cause identification as a text classification problem, employing supervised and transductive text classification algorithms to learn models from incident reports labeled with shaping factors and using the models to label unseen reports. Our experiments show that both the heuristic-based approach and the learning-based approach (when given sufficient training data) outperform the baseline system significantly.

lexicon, semantic lexicon, word and phrase, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2986

AI Access Foundation

10662

Journal of Artificial Intelligence Research

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Lake County > Waukegan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(5 more...)

Add feedback

Multi-Task Active Learning with Output Constraints

Zhang, Yi (Carnegie Mellon University)

AAAI ConferencesJul-15-2010

Many problems in information extraction, text mining, natural language processing and other fields exhibit the same property: multiple prediction tasks are related in the sense that their outputs (labels) satisfy certain constraints. In this paper, we propose an active learning framework exploiting such relations among tasks. Intuitively, with task outputs coupled by constraints, active learning can utilize not only the uncertainty of the prediction in a single task but also the inconsistency of predictions across tasks. We formalize this idea as a cross-task value of information criteria, in which the reward of a labeling assignment is propagated and measured over all relevant tasks reachable through constraints. A specific example of our framework leads to the cross entropy measure on the predictions of coupled tasks, which generalizes the entropy in the classical single-task uncertain sampling. We conduct experiments on two real-world problems: web information extraction and document classification. Empirical results demonstrate the effectiveness of our framework in actively collecting labeled examples for multiple related tasks.

constraint, data mining, machine learning, (20 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.34)

Technology:

Information Technology > Data Science > Data Mining > Text Mining (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)

Add feedback

Generating Domain-Specific Clues Using News Corpus for Sentiment Classification

Kim, Youngho (University of Massachusetts Amherst) | Choi, Yoonjung (KAIST) | Myaeng, Sung-Hyon (KAIST)

AAAI ConferencesMay-17-2010

This paper addresses the problem of automatically generating domain-specific sentiment clues. The main idea is to bootstrap from a small seed set and generate new clues by using dependencies and collocation information between sentiment clues and sentence-level topics that would be a primary subject of sentiment expression (e.g., event, company, and person). The experiments show that the aggregated clues are effective for sentiment classification.

artificial intelligence, natural language, text classification, (3 more...)

AAAI Conferences

Fourth International AAAI Conference on Weblogs and Social Media

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.60)

Add feedback

Document Classification for Focused Topics

Power, Russell (New York University) | Chen, Jay (New York University) | Karthik, Trishank (New York University) | Subramanian, Lakshminarayanan (New York University)

AAAI ConferencesMar-22-2010

Feature extraction is one of the fundamental challenges in improving the accuracy of document classification. While there has been a large body of research literature on document classification, most existing approaches either do not have a high classification accuracy or require massive training sets. In this paper, we propose a simple feature extraction algorithm that can achieve high document classification accuracy in the context of development-centric topics. Our feature extraction algorithm exploits two distinct aspects in development-centric topics: most of these topics tend to be very focused (unlike semantically hard classification topics such as chemistry or banks) due to local language and cultural underpinnings in these topics, the authentic pages tend to use several region specific features. Our algorithm uses a combination of popularity and rarity as two separate metrics to extract features that describe a topic. Given a topic, our output feature set comprises of: (i) a list of popular keywords closely related to the topic; (ii) a list of rare keywords closely related to the topic. We show that a simple joint classifier based on these two feature sets can achieve high classification accuracy while each feature sub-set in itself is insufficient. We have tested our algorithm across a wide range of development-centric topics.

classification, classifier, development-centric topic, (15 more...)

AAAI Conferences

2010 AAAI Spring Symposium Series

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.72)
Health & Medicine > Therapeutic Area > Immunology (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Add feedback

Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

Downey, Doug, Etzioni, Oren

Neural Information Processing SystemsDec-31-2009

Is accurate classification possible in the absence of hand-labeled data? This paper introduces the Monotonic Feature (MF) abstraction--where the probability of class membership increases monotonically with the MF's value. The paper proves that when an MF is given, PAC learning is possible with no hand-labeled data under certain assumptions. We argue that MFs arise naturally in a broad range of textual classification applications. On the classic "20 Newsgroups" data set, a learner given an MF and unlabeled data achieves classification accuracy equal to that of a state-of-the-art semi-supervised learner relying on 160 hand-labeled examples. Even when MFs are not given as input, their presence or absence can be determined from a small amount of hand-labeled data, which yields a new semi-supervised learning method that reduces error by 15% on the 20 Newsgroups data.

assumption, classifier, monotonic feature, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
(4 more...)

Add feedback

Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

Downey, Doug, Etzioni, Oren

Neural Information Processing SystemsDec-31-2009

assumption, classifier, monotonic feature, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
(4 more...)

Add feedback

Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

Downey, Doug, Etzioni, Oren

Neural Information Processing SystemsDec-31-2009

Is accurate classification possible in the absence of hand-labeled data?

machine learning, natural language, text classification, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.69)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
(3 more...)

Add feedback

Neuronal Spectral Analysis of EEG and Expert Knowledge Integration for Automatic Classification of Sleep Stages

Kerkeni, Nizar, Alexandre, Frederic, Bedoui, Mohamed Hedi, Bougrain, Laurent, Dogui, Mohamed

arXiv.org Artificial IntelligenceDec-1-2009

Being able to analyze and interpret signal coming from electroencephalogram (EEG) recording can be of high interest for many applications including medical diagnosis and Brain-Computer Interfaces. Indeed, human experts are today able to extract from this signal many hints related to physiological as well as cognitive states of the recorded subject and it would be very interesting to perform such task automatically but today no completely automatic system exists. In previous studies, we have compared human expertise and automatic processing tools, including artificial neural networks (ANN), to better understand the competences of each and determine which are the difficult aspects to integrate in a fully automatic system. In this paper, we bring more elements to that study in reporting the main results of a practical experiment which was carried out in an hospital for sleep pathology study. An EEG recording was studied and labeled by a human expert and an ANN. We describe here the characteristics of the experiment, both human and neuronal procedure of analysis, compare their performances and point out the main limitations which arise from this study.

machine learning, natural language, text classification, (6 more...)

arXiv.org Artificial Intelligence

cs/0510083

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.40)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback