Proper Noun Semantic Clustering Using Bag-of-Vectors
Ebadat, Ali Reza (INRIA-INSA) | Claveau, Vincent (IRISA-CNRS) | Sébillot, Pascale (IRISA-INS)
In this paper, we propose a model for semantic clustering of entities extracted from a text, and we apply it to a Proper Noun classification task.This model is based on a new method to compute the similarity between the entities.Indeed, the classical way of calculating similarity is to build a feature vector or Bag-of-Features for each entity and then use classical similarity functions like Cosine.In practice, the features are contextual, such as words around the different occurrences of each entity. Here, we propose to use an alternative representation for entities, called Bag-of-Vectors, or Bag-of-Bags-of-Features.In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity.In order to use Bag-of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine and Scalar Products. Experimentally, we show that the Bag-of-Vectors representation always improve the clustering results compared to classical Bag-of-Features representations.
May-20-2012