classe
Improving performance of random forests for a particular value of outcome by adding chosen features
Choosing features to improve a performance of a particular algorithm is a difficult question. Currently here is PCA, which is hard to understand (although it can be used out-of-the-box), is not easy to interpret and requires centralizing and scaling of features. In addition, it does not allow to improve prediction performance for a particular outcome (if its accuracy is lower than for others or it has a particular importance). My method enables to use features without preprocessing. Therefore a resulting prediction is easy to explain.
Empirical risk minimization is consistent with the mean absolute percentage error
De Myttenaere, Arnaud, Grand, Bénédicte Le, Rossi, Fabrice
We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression. We also show that, under some asumptions, universal consistency of Empirical Risk Minimization remains possible using the MAPE.
- North America > United States > New York (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Classification Recouvrante Bas\'ee sur les M\'ethodes \`a Noyau
N'Cir, Chiheb-Eddine Ben, Essoussi, Nadia
Overlapping clustering problem is an important learning issue in which clusters are not mutually exclusive and each object may belongs simultaneously to several clusters. This paper presents a kernel based method that produces overlapping clusters on a high feature space using mercer kernel techniques to improve separability of input patterns. The proposed method, called OKM-K(Overlapping $k$-means based kernel method), extends OKM (Overlapping $k$-means) method to produce overlapping schemes. Experiments are performed on overlapping dataset and empirical results obtained with OKM-K outperform results obtained with OKM.
- Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
Classification dynamique d'un flux documentaire : une \'evaluation statique pr\'ealable de l'algorithme GERMEN
Lelu, Alain, Cuxac, Pascal, Johansson, Joel
Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- (4 more...)