AITopics | Akaho, Shotaro

Collaborating Authors

Akaho, Shotaro

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Geometry of EM and related iterative algorithms

Hino, Hideitsu, Akaho, Shotaro, Murata, Noboru

arXiv.org Artificial IntelligenceNov-12-2022

The Expectation--Maximization (EM) algorithm is a simple meta-algorithm that has been used for many years as a methodology for statistical inference when there are missing measurements in the observed data or when the data is composed of observables and unobservables. Its general properties are well studied, and also, there are countless ways to apply it to individual problems. In this paper, we introduce the $em$ algorithm, an information geometric formulation of the EM algorithm, and its extensions and applications to various problems. Specifically, we will see that it is possible to formulate an outlier-robust inference algorithm, an algorithm for calculating channel capacity, parameter estimation methods on probability simplex, particular multivariate analysis methods such as principal component analysis in a space of probability models and modal regression, matrix factorization, and learning generative models, which have recently attracted attention in deep learning, from the geometric perspective.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2209.01301

Country:

Asia (1.00)
Europe (0.92)
North America > United States (0.67)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting

Karakida, Ryo, Akaho, Shotaro

arXiv.org Machine LearningDec-2-2021

Sequential training from task to task is becoming one of the major objects in deep learning applications such as continual learning and transfer learning. Nevertheless, it remains unclear under what conditions the trained model's performance improves or deteriorates. To deepen our understanding of sequential training, this study provides a theoretical analysis of generalization performance in a solvable case of continual learning. We consider neural networks in the neural tangent kernel (NTK) regime that continually learn target functions from task to task, and investigate the generalization by using an established statistical mechanical analysis of kernel ridge-less regression. We first show characteristic transitions from positive to negative transfer. More similar targets above a specific critical value can achieve positive knowledge transfer for the subsequent task while catastrophic forgetting occurs even with very similar targets. Next, we investigate a variant of continual learning where the model learns the same target function in multiple tasks. Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task. We can guarantee that the generalization error monotonically decreases from task to task for equal sample sizes while unbalanced sample sizes deteriorate the generalization. We respectively refer to these improvement and deterioration as self-knowledge transfer and forgetting, and empirically confirm them in realistic training of deep neural networks as well.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2112.01653

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Principal component analysis for Gaussian process posteriors

Ishibashi, Hideaki, Akaho, Shotaro

arXiv.org Machine LearningJul-15-2021

This paper proposes an extension of principal component analysis for Gaussian process posteriors denoted by GP-PCA. Since GP-PCA estimates a low-dimensional space of GP posteriors, it can be used for meta-learning, which is a framework for improving the precision of a new task by estimating a structure of a set of tasks. The issue is how to define a structure of a set of GPs with an infinite-dimensional parameter, such as coordinate system and a divergence. In this study, we reduce the infiniteness of GP to the finite-dimensional case under the information geometrical framework by considering a space of GP posteriors that has the same prior. In addition, we propose an approximation method of GP-PCA based on variational inference and demonstrate the effectiveness of GP-PCA as meta-learning through experiments.

artificial intelligence, machine learning, subspace, (17 more...)

arXiv.org Machine Learning

2107.07115

Country: North America > United States (0.46)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

On a convergence property of a geometrical algorithm for statistical manifolds

Akaho, Shotaro, Hino, Hideitsu, Murata, Noboru

arXiv.org Machine LearningSep-27-2019

Information geometry is a framework to analyze statistical inference and machine learning[2]. Geometrically, statistical inference and many machine learning algorithms can be regarded as procedures to find a projection to a model subspace from a given data point. In this paper, we focus on an algorithm to find the projection. Since the projection is given by minimizing a divergence, a common approach to finding the projection is a gradient-based method[6]. However, such an approach is not applicable in some cases. For instance, several attempts to extend the information geometrical framework to nonparametric cases[3, 9, 13, 15], where we need to consider a function space or each data is represented as a point process. In such a case, it is difficult to compute the derivative of divergence that is necessary for gradient-based methods, and in some cases, it is difficult to deal with the coordinate explicitly. Takano et al.[15] proposed a geometrical algorithm to find the projection for nonparametric e-mixture distribution, where the model subspace is spanned by several empirical distributions. The algorithm that is derived based on the generalized Pythagorean theorem only depends on the values of divergences.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1909.12644

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

Karakida, Ryo, Akaho, Shotaro, Amari, Shun-ichi

arXiv.org Machine LearningJun-7-2019

Normalization methods play an important role in enhancing the performance of deep learning while their theoretical understandings have been limited. To theoretically elucidate the effectiveness of normalization, we quantify the geometry of the parameter space determined by the Fisher information matrix (FIM), which also corresponds to the local shape of the loss landscape under certain conditions. We analyze deep neural networks with random initialization, which is known to suffer from a pathologically sharp shape of the landscape when the network becomes sufficiently wide. We reveal that batch normalization in the last layer contributes to drastically decreasing such pathological sharpness if the width and sample number satisfy a specific condition. In contrast, it is hard for batch normalization in the middle hidden layers to alleviate pathological sharpness in many settings. We also found that layer normalization cannot alleviate pathological sharpness either. Thus, we can conclude that batch normalization in the last layer significantly contributes to decreasing the sharpness induced by the FIM.

deep learning, neural network, normalization, (17 more...)

arXiv.org Machine Learning

1906.02926

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

Karakida, Ryo, Akaho, Shotaro, Amari, Shun-ichi

arXiv.org Machine LearningJun-4-2018

This study analyzes the Fisher information matrix (FIM) by applying mean-field theory to deep neural networks with random weights. We theoretically find novel statistics of the FIM, which are universal among a wide class of deep networks with any number of layers and various activation functions. Although most of the FIM's eigenvalues are close to zero, the maximum eigenvalue takes on a huge value and the eigenvalue distribution has an extremely long tail. These statistics suggest that the shape of a loss landscape is locally flat in most dimensions, but strongly distorted in the other dimensions. Moreover, our theory of the FIM leads to quantitative evaluation of learning in deep networks. First, the maximum eigenvalue enables us to estimate an appropriate size of a learning rate for steepest gradient methods to converge. Second, the flatness induced by the small eigenvalues is connected to generalization ability through a norm-based capacity measure.

deep learning, eigenvalue, neural network, (16 more...)

arXiv.org Machine Learning

1806.01316

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Constraint-free Graphical Model with Fast Learning Algorithm

Takabatake, Kazuya, Akaho, Shotaro

arXiv.org Machine LearningJun-17-2012

In this paper, we propose a simple, versatile model for learning the structure and parameters of multivariate distributions from a data set. Learning a Markov network from a given data set is not a simple problem, because Markov networks rigorously represent Markov properties, and this rigor imposes complex constraints on the design of the networks. Our proposed model removes these constraints, acquiring important aspects from the information geometry. The proposed parameter- and structure-learning algorithms are simple to execute as they are based solely on local computation at each node. Experiments demonstrate that our algorithms work appropriately.

artificial intelligence, banking & finance, firing process, (19 more...)

arXiv.org Machine Learning

1206.3721

Genre: Research Report (0.40)

Industry:

Automobiles & Trucks > Manufacturer (0.47)
Banking & Finance > Trading (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting

Fukumizu, Kenji, Akaho, Shotaro, Amari, Shun-ichi

Neural Information Processing SystemsDec-31-2003

We show the existence of critical points as lines for the likelihood function of mixture-type models. They are given by embedding of a critical point for models with less components. A sufficient condition that the critical line gives local maxima or saddle points is also derived. Based on this fact, a component-split method is proposed for a mixture of Gaussian components, and its effectiveness is verified through experiments.

artificial intelligence, machine learning, mixture model, (16 more...)

Neural Information Processing Systems

Country: