bic
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- Europe > Germany (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
Evidence Slopes and Effective Dimension in Singular Linear Models
Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters. Singular learning theory replaces this assumption with the real log canonical threshold (RLCT), an effective dimension that can be strictly smaller in overparameterized or rank-deficient models. We study linear-Gaussian rank models and linear subspace (dictionary) models in which the exact marginal likelihood is available in closed form and the RLCT is analytically tractable. In this setting, we show theoretically and empirically that the error of Laplace/BIC grows linearly with (d/2 minus lambda) times log n, where d is the ambient parameter dimension and lambda is the RLCT. An RLCT-aware correction recovers the correct evidence slope and is invariant to overcomplete reparameterizations that represent the same data subspace. Our results provide a concrete finite-sample characterization of Laplace failure in singular models and demonstrate that evidence slopes can be used as a practical estimator of effective dimension in simple linear settings.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Order Selection in Vector Autoregression by Mean Square Information Criterion
Hellstern, Michael, Shojaie, Ali
Vector autoregressive (VAR) processes are ubiquitously used in economics, finance, and biology. Order selection is an essential step in fitting VAR models. While many order selection methods exist, all come with weaknesses. Order selection by minimizing AIC is a popular approach but is known to consistently overestimate the true order for processes of small dimension. On the other hand, methods based on BIC or the Hannan-Quinn (HQ) criteria are shown to require large sample sizes in order to accurately estimate the order for larger-dimensional processes. We propose the mean square information criterion (MIC) based on the observation that the expected squared error loss is flat once the fitted order reaches or exceeds the true order. MIC is shown to consistently estimate the order of the process under relatively mild conditions. Our simulation results show that MIC offers better performance relative to AIC, BIC, and HQ under misspecification. This advantage is corroborated when forecasting COVID-19 outcomes in New York City. Order selection by MIC is implemented in the micvar R package available on CRAN.
- North America > United States > New York (0.24)
- North America > Mexico (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (4 more...)
- Banking & Finance > Trading (1.00)
- Health & Medicine > Therapeutic Area (0.87)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- (2 more...)
What is in the model? A Comparison of variable selection criteria and model search approaches
Xu, Shuangshuang, Ferreira, Marco A. R., Tegge, Allison N.
What is in the model? Abstract For many scientific questions, understanding the underlying mechanism is the goal. To help investigators better understand the underlying mechanism, variable selection is a crucial step that permits the identification of the most associated regression variables of interest. A variable selection method consists of model evaluation using an information criterion and a search of the model space. Here, we provide a comprehensive comparison of variable selection methods using performance measures of correct identification rate (CIR), recall, and false discovery rate (FDR). We consider the BIC and AIC for evaluating models, and exhaustive, greedy, LASSO path, and stochastic search approaches for searching the model space; we also consider LASSO using cross validation. We perform simulation studies for linear and generalized linear models that parametrically explore a wide range of realistic sample sizes, effect sizes, and correlations among regression variables. We consider model spaces with a small and larger number of potential regressors. The results show that the exhaustive search BIC and stochastic search BIC outperform the other methods when considering the performance measures on small and large model spaces, respectively. These approaches result in the highest CIR and lowest FDR, which collectively may support long-term efforts towards increasing replicability in research.
- North America > United States > Virginia > Roanoke (0.04)
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
Learning Bayesian Networks with Thousands of Variables
Mauro Scanagatta, Cassio P. de Campos, Giorgio Corani, Marco Zaffalon
We present a method for learning Bayesian networks from data sets containing thousands of variables without the need for structure constraints. Our approach is made of two parts. The first is a novel algorithm that effectively explores the space of possible parent sets of a node. It guides the exploration towards the most promising parent sets on the basis of an approximated score function that is computed in constant time. The second part is an improvement of an existing ordering-based algorithm for structure optimization. The new algorithm provably achieves a higher score compared to its original formulation. Our novel approach consistently outperforms the state of the art on very large data sets.
- Europe > Switzerland (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Europe > United Kingdom > Northern Ireland > County Down > Belfast (0.04)
- Europe > United Kingdom > Northern Ireland > County Antrim > Belfast (0.04)
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)
DRBM-ClustNet: A Deep Restricted Boltzmann-Kohonen Architecture for Data Clustering
Senthilnath, J., G, Nagaraj, C, Sumanth Simha, Kulkarni, Sushant, Thapa, Meenakumari, M, Indiramma, Benediktsson, Jón Atli
A Bayesian Deep Restricted Boltzmann-Kohonen architecture for data clustering termed as DRBM-ClustNet is proposed. This core-clustering engine consists of a Deep Restricted Boltzmann Machine (DRBM) for processing unlabeled data by creating new features that are uncorrelated and have large variance with each other. Next, the number of clusters are predicted using the Bayesian Information Criterion (BIC), followed by a Kohonen Network-based clustering layer. The processing of unlabeled data is done in three stages for efficient clustering of the non-linearly separable datasets. In the first stage, DRBM performs non-linear feature extraction by capturing the highly complex data representation by projecting the feature vectors of $d$ dimensions into $n$ dimensions. Most clustering algorithms require the number of clusters to be decided a priori, hence here to automate the number of clusters in the second stage we use BIC. In the third stage, the number of clusters derived from BIC forms the input for the Kohonen network, which performs clustering of the feature-extracted data obtained from the DRBM. This method overcomes the general disadvantages of clustering algorithms like the prior specification of the number of clusters, convergence to local optima and poor clustering accuracy on non-linear datasets. In this research we use two synthetic datasets, fifteen benchmark datasets from the UCI Machine Learning repository, and four image datasets to analyze the DRBM-ClustNet. The proposed framework is evaluated based on clustering accuracy and ranked against other state-of-the-art clustering methods. The obtained results demonstrate that the DRBM-ClustNet outperforms state-of-the-art clustering algorithms.
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- (3 more...)
A system identification approach to clustering vector autoregressive time series
Yue, Zuogong, Wang, Xinyi, Solo, Victor
Clustering of time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling. Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction, where the autocorrelation pattern/feature is mostly ignored. Instead of relying on heuristic feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics. We first derive a clustering algorithm based on a mixture autoregressive model. Unfortunately it turns out to have significant computational problems. We then derive a `small-noise' limiting version of the algorithm, which we call k-LMVAR (Limiting Mixture Vector AutoRegression), that is computationally manageable. We develop an associated BIC criterion for choosing the number of clusters and model order. The algorithm performs very well in comparative simulations and also scales well computationally.
- Asia > China (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California (0.04)
- Europe > Monaco (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
A flexible Bayesian non-parametric mixture model reveals multiple dependencies of swap errors in visual working memory
Radmard, Puria, Bays, Paul M., Lengyel, Máté
Human behavioural data in psychophysics has been used to elucidate the underlying mechanisms of many cognitive processes, such as attention, sensorimotor integration, and perceptual decision making. Visual working memory has particularly benefited from this approach: analyses of VWM errors have proven crucial for understanding VWM capacity and coding schemes, in turn constraining neural models of both. One poorly understood class of VWM errors are swap errors, whereby participants recall an uncued item from memory. Swap errors could arise from erroneous memory encoding, noisy storage, or errors at retrieval time - previous research has mostly implicated the latter two. However, these studies made strong a priori assumptions on the detailed mechanisms and/or parametric form of errors contributed by these sources. Here, we pursue a data-driven approach instead, introducing a Bayesian non-parametric mixture model of swap errors (BNS) which provides a flexible descriptive model of swapping behaviour, such that swaps are allowed to depend on both the probed and reported features of every stimulus item. We fit BNS to the trial-by-trial behaviour of human participants and show that it recapitulates the strong dependence of swaps on cue similarity in multiple datasets. Critically, BNS reveals that this dependence coexists with a non-monotonic modulation in the report feature dimension for a random dot motion direction-cued, location-reported dataset. The form of the modulation inferred by BNS opens new questions about the importance of memory encoding in causing swap errors in VWM, a distinct source to the previously suggested binding and cueing errors. Our analyses, combining qualitative comparisons of the highly interpretable BNS parameter structure with rigorous quantitative model comparison and recovery methods, show that previous interpretations of swap errors may have been incomplete.
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The paper describes tricks to scale Bayesian network structure learning to thousands of variables. This is achieved by developing new heuristics for candidate parent set identification and the subsequent order based structure optimization. In general, the paper is clearly written and easy to read. There are issues in editing and style, but the problems do not affect readability (much). The suggested heuristics feel bit ad-hoc, thus the value of the work is eventually judged by empirical evaluation.