Vincent, Pascal
Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
Chapados, Nicolas, Bengio, Yoshua, Vincent, Pascal, Ghosn, Joumana, Dugas, Charles, Takeuchi, Ichiro, Meng, Linyan
This conditional expected claim amount is called the pure premium and it is the basis of the gross premium charged to the insured. This expected value is conditionned on information available about the insured and about the contract, which we call input profile here. This regression problem is difficult for several reasons: large number of examples, -large number variables (most of which are discrete and multi-valued), non-stationarity of the distribution, and a conditional distribution of the dependent variable which is very different from those usually encountered in typical applications .of
K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
Vincent, Pascal, Bengio, Yoshua
Guided by an initial idea of building a complex (non linear) decision surface with maximal local margin in input space, we give a possible geometrical intuition as to why K-Nearest Neighbor (KNN) algorithms often perform more poorly than SVMs on classification tasks. We then propose modified K-Nearest Neighbor algorithms to overcome the perceived problem.The approach is similar in spirit to Tangent Distance, but with invariances inferred from the local neighborhood rather than prior knowledge. Experimental results on real world classification tasks suggest thatthe modified KNN algorithms often give a dramatic improvement overstandard KNN and perform as well or better than SVMs.
A Neural Probabilistic Language Model
Bengio, Yoshua, Ducharme, Réjean, Vincent, Pascal
A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation for each word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves on a state-of-the-art trigram model. 1 Introduction A fundamental problem that makes language modeling and other learning problems difficult is the curse of dimensionality. It is particularly obvious in the case when one wants to model the joint distribution between many discrete random variables (such as words in a sentence, or discrete attributes in a data-mining task).
A Neural Probabilistic Language Model
Bengio, Yoshua, Ducharme, Réjean, Vincent, Pascal
A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation foreach word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalizationis obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves ona state-of-the-art trigram model. 1 Introduction A fundamental problem that makes language modeling and other learning problems difficult isthe curse of dimensionality. It is particularly obvious in the case when one wants to model the joint distribution between many discrete random variables (such as words in a sentence, or discrete attributes in a data-mining task).