ClassiNet -- Predicting Missing Features for Short-Text Classification

Bollegala, Danushka, Atanasov, Vincent, Maehara, Takanori, Kawarabayashi, Ken-ichi

Apr-14-2018–arXiv.org Artificial Intelligence

The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

classinet, deep learning, neural network, (23 more...)

arXiv.org Artificial Intelligence

Apr-14-2018

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Qatar (0.14)
- Europe (0.67)
- North America > United States
  - Colorado (0.14)
  - Massachusetts (0.14)

Genre:
- Research Report > New Finding (0.88)

Industry:
- Banking & Finance (0.67)
- Media (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Inductive Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)
    - Neural Networks (1.00)
    - Performance Analysis > Accuracy (0.69)
    - Statistical Learning (1.00)
    - Supervised Learning (1.00)
  - Natural Language
    - Discourse & Dialogue (0.93)
    - Text Classification (1.00)
    - Text Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found