AITopics | Murata, Noboru

Collaborating Authors

Murata, Noboru

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Geometry of EM and related iterative algorithms

Hino, Hideitsu, Akaho, Shotaro, Murata, Noboru

arXiv.org Artificial IntelligenceNov-12-2022

The Expectation--Maximization (EM) algorithm is a simple meta-algorithm that has been used for many years as a methodology for statistical inference when there are missing measurements in the observed data or when the data is composed of observables and unobservables. Its general properties are well studied, and also, there are countless ways to apply it to individual problems. In this paper, we introduce the $em$ algorithm, an information geometric formulation of the EM algorithm, and its extensions and applications to various problems. Specifically, we will see that it is possible to formulate an outlier-robust inference algorithm, an algorithm for calculating channel capacity, parameter estimation methods on probability simplex, particular multivariate analysis methods such as principal component analysis in a space of probability models and modal regression, matrix factorization, and learning generative models, which have recently attracted attention in deep learning, from the geometric perspective.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2209.01301

Country:

Asia (1.00)
Europe (0.92)
North America > United States (0.67)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

On a convergence property of a geometrical algorithm for statistical manifolds

Akaho, Shotaro, Hino, Hideitsu, Murata, Noboru

arXiv.org Machine LearningSep-27-2019

Information geometry is a framework to analyze statistical inference and machine learning[2]. Geometrically, statistical inference and many machine learning algorithms can be regarded as procedures to find a projection to a model subspace from a given data point. In this paper, we focus on an algorithm to find the projection. Since the projection is given by minimizing a divergence, a common approach to finding the projection is a gradient-based method[6]. However, such an approach is not applicable in some cases. For instance, several attempts to extend the information geometrical framework to nonparametric cases[3, 9, 13, 15], where we need to consider a function space or each data is represented as a point process. In such a case, it is difficult to compute the derivative of divergence that is necessary for gradient-based methods, and in some cases, it is difficult to deal with the coordinate explicitly. Takano et al.[15] proposed a geometrical algorithm to find the projection for nonparametric e-mixture distribution, where the model subspace is spanned by several empirical distributions. The algorithm that is derived based on the generalized Pythagorean theorem only depends on the values of divergences.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1909.12644

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Integral representation of the global minimizer

Sonoda, Sho, Ishikawa, Isao, Ikeda, Masahiro, Hagihara, Kei, Sawano, Yoshihiro, Matsubara, Takuo, Murata, Noboru

arXiv.org Machine LearningMay-19-2018

We have obtained an integral representation of the shallow neural network that attains the global minimum of its backpropagation (BP) training problem. According to our unpublished numerical simulations conducted several years prior to this study, we had noticed that such an integral representation may exist, but it was not proven until today. First, we introduced a Hilbert space of coefficient functions, and a reproducing kernel Hilbert space (RKHS) of hypotheses, associated with the integral representation. The RKHS reflects the approximation ability of neural networks. Second, we established the ridgelet analysis on RKHS. The analytic property of the integral representation is remarkably clear. Third, we reformulated the BP training as the optimization problem in the space of coefficient functions, and obtained a formal expression of the unique global minimizer, according to the Tikhonov regularization theory. Finally, we demonstrated that the global minimizer is the shrink ridgelet transform. Since the relation between an integral representation and an ordinary finite network is not clear, and BP is convex in the integral representation, we cannot immediately answer the question such as "Is a local minimum a global minimum?" However, the obtained integral representation provides an explicit expression of the global minimizer, without linearity-like assumptions, such as partial linearity and monotonicity. Furthermore, it indicates that the ordinary ridgelet transform provides the minimum norm solution to the original training equation.

artificial intelligence, integral representation, neural network, (15 more...)

arXiv.org Machine Learning

1805.07517

Country: Asia > Japan (0.15)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks

Sonoda, Sho, Murata, Noboru

arXiv.org Machine LearningDec-12-2017

The feature map obtained from the denoising autoencoder (DAE) is investigated by determining transportation dynamics of the DAE, which is a cornerstone for deep learning. Despite the rapid development in its application, deep neural networks remain analytically unexplained, because the feature maps are nested and parameters are not faithful. In this paper, we address the problem of the formulation of nested complex of parameters by regarding the feature map as a transport map. Even when a feature map has different dimensions between input and output, we can regard it as a transportation map by considering that both the input and output spaces are embedded in a common high-dimensional space. In addition, the trajectory is a geometric object and thus, is independent of parameterization. In this manner, transportation can be regarded as a universal character of deep neural networks. By determining and analyzing the transportation dynamics, we can understand the behavior of a deep neural network. In this paper, we investigate a fundamental case of deep neural networks: the DAE. We derive the transport map of the DAE, and reveal that the infinitely deep DAE transports mass to decrease a certain quantity, such as entropy, of the data distribution. These results though analytically simple, shed light on the correspondence between deep neural networks and the Wasserstein gradient flows.

dae, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

1712.04145

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Decoding Stacked Denoising Autoencoders

Sonoda, Sho, Murata, Noboru

arXiv.org Machine LearningMay-9-2016

Data representation in a stacked denoising autoencoder is investigated. Decoding is a simple technique for translating a stacked denoising autoencoder into a composition of denoising autoencoders in the ground space. In the infinitesimal limit, a composition of denoising autoencoders is reduced to a continuous denoising autoencoder, which is rich in analytic properties and geometric interpretation. For example, the continuous denoising autoencoder solves the backward heat equation and transports each data point so as to decrease entropy of the data distribution. Together with ridgelet analysis, an integral representation of a stacked denoising autoencoder is derived.

autoencoder, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1605.02832

Country:

North America > Canada (0.14)
North America > Puerto Rico (0.14)
Europe > Germany (0.14)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Population Decoding Based on an Unfaithful Model

Wu, Si, Nakahara, Hiroyuki, Murata, Noboru, Amari, Shun-ichi

Neural Information Processing SystemsDec-31-2000

We study a population decoding paradigm in which the maximum likelihood inferenceis based on an unfaithful decoding model (UMLI). This is usually the case for neural population decoding because the encoding process of the brain is not exactly known, or because a simplified decoding modelis preferred for saving computational cost. We consider an unfaithful decoding model which neglects the pairwise correlation between neuronal activities, and prove that UMLI is asymptotically efficient whenthe neuronal correlation is uniform or of limited-range. The performance of UMLI is compared with that of the maximum likelihood inference based on a faithful model and that of the center of mass decoding method.It turns out that UMLI has advantages of decreasing the computational complexity remarkablely and maintaining a high-level decoding accuracy at the same time. The effect of correlation on the decoding accuracy is also discussed.

bayesian inference, neurology, umli, (19 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)

Add feedback

Adaptive On-line Learning in Changing Environments

Murata, Noboru, Müller, Klaus-Robert, Ziehe, Andreas, Amari, Shun-ichi

Neural Information Processing SystemsDec-31-1997

An adaptive online algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flowinformation it can be applied to learning continuous functions or distributions, even when no explicit loss function is given andthe Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals. 1 Introduction Neural networks provide powerful tools to capture the structure in data by learning. Often the batch learning paradigm is assumed, where the learner is given all training examplessimultaneously and allowed to use them as often as desired. In large practical applications batch learning is often experienced to be rather infeasible and instead online learning is employed.

algorithm, computer based training, educational technology, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Germany (0.14)
Asia > Japan (0.14)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.73)

Add feedback

Adaptive On-line Learning in Changing Environments

Murata, Noboru, Müller, Klaus-Robert, Ziehe, Andreas, Amari, Shun-ichi

Neural Information Processing SystemsDec-31-1997

algorithm, artificial intelligence, educational setting, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Germany (0.14)
Asia > Japan (0.14)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.53)

Add feedback

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

Amari, Shun-ichi, Murata, Noboru, Müller, Klaus-Robert, Finke, Michael, Yang, Howard Hua

Neural Information Processing SystemsDec-31-1996

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

artificial intelligence, neural network, stopping, (19 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.29)
Asia > Japan > Honshū > Kantō (0.15)
North America > United States > Illinois (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.65)

Add feedback

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

Amari, Shun-ichi, Murata, Noboru, Müller, Klaus-Robert, Finke, Michael, Yang, Howard Hua

Neural Information Processing SystemsDec-31-1996

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, evenif we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order toobtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error.Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

artificial intelligence, neural network, stopping, (19 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.29)
Asia > Japan > Honshū > Kantō (0.15)
North America > United States > Illinois (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.65)

Add feedback