Bengio, Yoshua, Roux, Nicolas L., Vincent, Pascal, Delalleau, Olivier, Marcotte, Patrice

Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage ofmany learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors.

Bengio, Yoshua, Roux, Nicolas L., Vincent, Pascal, Delalleau, Olivier, Marcotte, Patrice

Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors.

Bengio, Yoshua, Roux, Nicolas L., Vincent, Pascal, Delalleau, Olivier, Marcotte, Patrice

Schulte, Oliver (Simon Fraser University)

Particle physics experiments, like the Large Hadron Collider in Geneva, can generate thousands of data points listing detected particle reactions. An important learning task is to analyze the reaction data for evidence of conserved quantities and hidden particles. This task involves latent structure in two ways: first, hypothesizing hidden quantities whose conservation determines which reactions occur, and second, hypothesizing the presence of hidden particles. We model this problem in the classic linear algebra framework of automated scientific discovery due to Valdes-Perez, Zytkow and Simon, where both reaction data and conservation laws are represented as matrices. We introduce a new criterion for selecting a matrix model for reaction data: find hidden particles and conserved quantities that rule out as many interactions among the nonhidden particles as possible. A polynomial-time algorithm for optimizing this criterion is based on the new theorem that hidden particles are required if and only if the Smith Normal Form of the reaction matrix R contains entries other than 0 or 1. To our knowledge this is the first application of Smith matrix decomposition to a problem in AI. Using data from particle accelerators, we compare our algorithm to the main model of particles in physics, known as the Standard Model: our algorithm discovers conservation laws that are equivalent to those in the Standard Model, and indicates the presence of a hidden particle (the electron antineutrino) in accordance with the Standard Model.

In the last article (click here), we briefly talked about the basics of ANN technique. But before using the technique, an analyst must know, how does the technique really work? Even though the detailed derivation may not be required, one should know the framework of the algorithm. This article will provide you a basic understanding of Artificial Neural Network (ANN) framework. We won't go into actual derivation, but the information provided in this article will be sufficient for you to appreciate and implement the algorithm.