parametrisation
Evidence Slopes and Effective Dimension in Singular Linear Models
Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters. Singular learning theory replaces this assumption with the real log canonical threshold (RLCT), an effective dimension that can be strictly smaller in overparameterized or rank-deficient models. We study linear-Gaussian rank models and linear subspace (dictionary) models in which the exact marginal likelihood is available in closed form and the RLCT is analytically tractable. In this setting, we show theoretically and empirically that the error of Laplace/BIC grows linearly with (d/2 minus lambda) times log n, where d is the ambient parameter dimension and lambda is the RLCT. An RLCT-aware correction recovers the correct evidence slope and is invariant to overcomplete reparameterizations that represent the same data subspace. Our results provide a concrete finite-sample characterization of Laplace failure in singular models and demonstrate that evidence slopes can be used as a practical estimator of effective dimension in simple linear settings.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
referees correctly identified the main contributions of this work, namely: 1) a physically-inspired RNN generalising
We are grateful to the anonymous referees for their valuable comments and suggestions. A smoother introduction of equation (2) would be nice. To retain the PGD interpretation of the network, one can use truncated ReLus as activation functions. I believe that the experimental results section could have been larger, with more results to support the claims. The ninth page will comment further on the experimental results.
Limitations of Quantum Advantage in Unsupervised Machine Learning
Machine learning models are used for pattern recognition analysis of big data, without direct human intervention. The task of unsupervised learning is to find the probability distribution that would best describe the available data, and then use it to make predictions for observables of interest. Classical models generally fit the data to Boltzmann distribution of Hamiltonians with a large number of tunable parameters. Quantum extensions of these models replace classical probability distributions with quantum density matrices. An advantage can be obtained only when features of density matrices that are absent in classical probability distributions are exploited. Such situations depend on the input data as well as the targeted observables. Explicit examples are discussed that bring out the constraints limiting possible quantum advantage. The problem-dependent extent of quantum advantage has implications for both data analysis and sensing applications.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > India > Karnataka > Bengaluru (0.05)
- (2 more...)
- Europe > Estonia > Tartu County > Tartu (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Hierarchical classification at multiple operating points
Figure 4: Impact of loss hyper-parameters on trade-off with iNat21-Mini (correct vs. recall). Table 3 outlines the parametrisation that corresponds to each loss function. Table 3: Definition and properties of the parametrisations used by each loss function.Loss θ Parametrisation Properties Flat softmax, HXE [2] R Algorithm 1 Algorithm for finding ordered Pareto set. We use square brackets to denote array elements (subscripts were used in the main text).procedure
- Asia > Middle East > Jordan (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
- Asia > Middle East > Jordan (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)