Watanabe, Sumio
The Effect of Singularities in a Learning Machine when the True Parameters Do Not Lie on such Singularities
Watanabe, Sumio, Amari, Shun-ichi
A lot of learning machines with hidden variables used in information science have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, resulting that the learning theory of regular statistical models does not hold. Recently, it was proven that, if the true parameter is contained in singularities, then the coefficient of the Bayes generalization error is equal to the pole of the zeta function of the Kullback information.
The Effect of Singularities in a Learning Machine when the True Parameters Do Not Lie on such Singularities
Watanabe, Sumio, Amari, Shun-ichi
A lot of learning machines with hidden variables used in information sciencehave singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, resulting that the learning theory of regular statistical models does not hold. Recently, it was proven that, if the true parameter is contained in singularities, then the coefficient of the Bayes generalization erroris equal to the pole of the zeta function of the Kullback information.
Algebraic Information Geometry for Learning Machines with Singularities
Watanabe, Sumio
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.
Algebraic Information Geometry for Learning Machines with Singularities
Watanabe, Sumio
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matrices are singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.
Algebraic Information Geometry for Learning Machines with Singularities
Watanabe, Sumio
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matricesare singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.
Algebraic Analysis for Non-regular Learning Machines
Watanabe, Sumio
Hierarchical learning machines are non-regular and non-identifiable statistical models, whose true parameter sets are analytic sets with singularities. Using algebraic analysis, we rigorously prove that the stochastic complexity of a non-identifiable learning machine is asymptotically equal to '1 log n - (ml - 1) log log n
Algebraic Analysis for Non-regular Learning Machines
Watanabe, Sumio
Hierarchical learning machines are non-regular and non-identifiable statistical models, whose true parameter sets are analytic sets with singularities. Using algebraic analysis, we rigorously prove that the stochastic complexity of a non-identifiable learning machine is asymptotically equal to '1 log n - (ml - 1) log log n
Solvable Models of Artificial Neural Networks
Watanabe, Sumio
Solvable models of nonlinear learning machines are proposed, and learning in artificial neural networks is studied based on the theory of ordinary differential equations. A learning algorithm is constructed, by which the optimal parameter can be found without any recursive procedure. The solvable models enable us to analyze the reason why experimental results by the error backpropagation often contradict the statistical learning theory.
An Optimization Method of Layered Neural Networks based on the Modified Information Criterion
Watanabe, Sumio
This paper proposes a practical optimization method for layered neural networks, by which the optimal model and parameter can be found simultaneously. 'i\Te modify the conventional information criterion into a differentiable function of parameters, and then, minimize it,while controlling it back to the ordinary form. Effectiveness of this method is discussed theoretically and experimentally.