Classification dynamique d'un flux documentaire : une \'evaluation statique pr\'ealable de l'algorithme GERMEN
Lelu, Alain, Cuxac, Pascal, Johansson, Joel
–arXiv.org Artificial Intelligence
Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule.
arXiv.org Artificial Intelligence
Nov-4-2008
- Country:
- Asia > Middle East
- Republic of Türkiye (0.04)
- Europe
- France
- Bourgogne-Franche-Comté > Doubs
- Besançon (0.04)
- Occitanie > Haute-Garonne
- Toulouse (0.04)
- Bourgogne-Franche-Comté > Doubs
- Slovenia > Central Slovenia
- Municipality of Ljubljana > Ljubljana (0.04)
- United Kingdom > England
- East Sussex > Brighton (0.04)
- France
- North America > United States
- California > San Mateo County
- Menlo Park (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > San Mateo County
- Asia > Middle East
- Genre:
- Research Report (0.40)
- Technology: