nml
Normalized Maximum Likelihood Code-Length on Riemannian Manifold Data Spaces
Fukuzawa, Kota, Suzuki, Atsushi, Yamanishi, Kenji
--In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressive power for graph data with hierarchical structures. Normalized Maximum Likelihood (NML) is employed in regret minimization and model selection. However, existing formulations of NML have been developed primarily in Euclidean spaces and are inherently dependent on the choice of coordinate systems, making it non-trivial to extend NML to Riemannian manifolds. In this study, we define a new NML that reflects the geometric structure of Riemannian manifolds, called the Riemannian manifold NML (Rm-NML). This Rm-NML is invariant under coordinate transformations and coincides with the conventional NML under the natural parameterization in Euclidean space. We extend existing computational techniques for NML to the setting of Riemannian manifolds. Furthermore, we derive a method to simplify the computation of Rm-NML on Riemannian symmetric spaces, which encompass data spaces of growing interest such as hyperbolic spaces. T o illustrate the practical application of our proposed method, we explicitly computed the Rm-NML for normal distributions on hyperbolic spaces. With the recent increase in the scale of graph data, Riemannian manifold data spaces other than Euclidian spaces are attracting attention as latent spaces suitable for graph embedding [1, 2]. For example, hyperbolic spaces have been demonstrated to possess high expressive power for graph data with hierarchical structures [3]. Spherical spaces are particularly effective in representing graph data with cyclic structures [4]. Notably, research on hyperbolic spaces has been particularly remarkable[3]. Specifically, in the field of representation learning, methods that embed hierarchical structures into hyperbolic space have successfully represented such relationships using significantly lower-dimensional space compared to conventional methods based on Euclidean space, while preserving the essential relational information[2].
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)
Detecting Signs of Model Change with Continuous Model Selection Based on Descriptive Dimensionality
We address the issue of detecting changes of models that lie behind a data stream. The model refers to an integer-valued structural information such as the number of free parameters in a parametric model. Specifically we are concerned with the problem of how we can detect signs of model changes earlier than they are actualized. To this end, we employ {\em continuous model selection} on the basis of the notion of {\em descriptive dimensionality}~(Ddim). It is a real-valued model dimensionality, which is designed for quantifying the model dimensionality in the model transition period. Continuous model selection is to determine the real-valued model dimensionality in terms of Ddim from a given data. We propose a novel methodology for detecting signs of model changes by tracking the rise-up of Ddim in a data stream. We apply this methodology to detecting signs of changes of the number of clusters in a Gaussian mixture model and those of the order in an auto regression model. With synthetic and real data sets, we empirically demonstrate its effectiveness by showing that it is able to visualize well how rapidly model dimensionality moves in the transition period and to raise early warning signals of model changes earlier than they are detected with existing methods.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > New York (0.04)
Detecting Hierarchical Changes in Latent Variable Models
Fukushima, Shintaro, Yamanishi, Kenji
This paper addresses the issue of detecting hierarchical changes in latent variable models (HCDL) from data streams. There are three different levels of changes for latent variable models: 1) the first level is the change in data distribution for fixed latent variables, 2) the second one is that in the distribution over latent variables, and 3) the third one is that in the number of latent variables. It is important to detect these changes because we can analyze the causes of changes by identifying which level a change comes from (change interpretability). This paper proposes an information-theoretic framework for detecting changes of the three levels in a hierarchical way. The key idea to realize it is to employ the MDL (minimum description length) change statistics for measuring the degree of change, in combination with DNML (decomposed normalized maximum likelihood) code-length calculation. We give a theoretical basis for making reliable alarms for changes. Focusing on stochastic block models, we employ synthetic and benchmark datasets to empirically demonstrate the effectiveness of our framework in terms of change interpretability as well as change detection.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
- Asia > Japan > Honshū > Tōhoku > Aomori Prefecture > Aomori (0.04)
- (6 more...)
Descriptive Dimensionality and Its Characterization of MDL-based Learning and Change Detection
This paper introduces a new notion of dimensionality of probabilistic models from an information-theoretic view point. We call it the "descriptive dimension"(Ddim). We show that Ddim coincides with the number of independent parameters for the parametric class, and can further be extended to real-valued dimensionality when a number of models are mixed. The paper then derives the rate of convergence of the MDL (Minimum Description Length) learning algorithm which outputs a normalized maximum likelihood (NML) distribution with model of the shortest NML codelength. The paper proves that the rate is governed by Ddim. The paper also derives error probabilities of the MDL-based test for multiple model change detection. It proves that they are also governed by Ddim. Through the analysis, we demonstrate that Ddim is an intrinsic quantity which characterizes the performance of the MDL-based learning and change detection.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > New York (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
The Stochastic Complexity of Principal Component Analysis
PCA (principal component analysis) and its variants are ubiquitous techniques for matrix dimension reduction and reduced-dimension latent-factor extraction. For an arbitrary matrix, they cannot, on their own, determine the size of the reduced dimension, but rather must be given this as an input. NML (normalized maximum likelihood) is a universal implementation of the Minimal Description Length principle, which gives an objective compression-based criterion for model selection. This work applies NML to PCA. A direct attempt to do so would involve the distributions of singular values of random matrices, which is difficult. A reduction to linear regression with a noisy unitary covariate matrix, however, allows finding closed-form bounds on the NML of PCA.
- North America > United States (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Model selection by minimum description length: Lower-bound sample sizes for the Fisher information approximation
Heck, Daniel W., Moshagen, Morten, Erdfelder, Edgar
For the published version of the article, see: Heck, D. W., Moshagen, M., & Erdfelder, E. (2014). Correspondence concerning this article should be addressed to Daniel W. Heck, Department of Psychology, School of Social Sciences, University of Mannheim, Schloss EO 254, D-68131 Mannheim, Germany. FISHER INFORMATION APPROXIMATION 2 Abstract The Fisher information approximation (FIA) is an implementation of the minimum description length principle for model selection. Unlike information criteria such as AIC or BIC, it has the advantage of taking the functional form of a model into account. Unfortunately, FIA can be misleading in finite samples, resulting in an inversion of the correct rank order of complexity terms for competing models in the worst case.
- Europe > Germany (0.25)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Non-monotonic Logic (Stanford Encyclopedia of Philosophy)
Clearly, the second approach is more cautious. Intuitively, it demands that there is a specific argument for τ that is contained in each rational stance a reasoner can take given Γ, DRules, and SRules. The first option doesn't bind the acceptability of τ to a specific argument: it is sufficient if according to each rational stance there is some argument for τ. In Default Logic, the main representational tool is that of a default rule, or simply a default.
- North America > United States > California > Santa Clara County > Palo Alto (0.40)
- North America > United States > Pennsylvania (0.04)
- Europe > Germany (0.04)
- Europe > Belarus > Minsk Region > Minsk (0.04)
Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation
Barron, Andrew, Roos, Teemu, Watanabe, Kazuho
The normalized maximized likelihood (NML) provides the minimax regret solution in universal data compression, gambling, and prediction, and it plays an essential role in the minimum description length (MDL) method of statistical modeling and estimation. Here we show that the normalized maximum likelihood has a Bayes-like representation as a mixture of the component models, even in finite samples, though the weights of linear combination may be both positive and negative. This representation addresses in part the relationship between MDL and Bayes modeling. This representation has the advantage of speeding the calculation of marginals and conditionals required for coding and prediction applications.