Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization
–Neural Information Processing Systems
The project pursued in this paper is to develop from first information-geometric principles a general method for learning the similarity between text documents. Each individual document is modeled as a memoryless information source. Based on a latent class decomposition of the term-document matrix, a lowdimensional (curved) multinomial subfamily is learned. From this model a canonical similarity function - known as the Fisher kernel - is derived. Our approach can be applied for unsupervised and supervised learning problems alike.
Neural Information Processing Systems
Dec-31-2000
- Country:
- South America > Brazil (0.04)
- North America > United States
- New York (0.05)
- Rhode Island > Providence County
- Providence (0.04)
- Asia
- Middle East > Jordan (0.04)
- Japan (0.04)
- Industry:
- Government (0.46)