stan
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
Mishra, Aayush, Habermann, Daniel, Schmitt, Marvin, Radev, Stefan T., Bürkner, Paul-Christian
Neural amortized Bayesian inference (ABI) can solve probabilistic inverse problems orders of magnitude faster than classical methods. However, neural ABI is not yet sufficiently robust for widespread and safe applicability. In particular, when performing inference on observations outside of the scope of the simulated data seen during training, for example, because of model misspecification, the posterior approximations are likely to become highly biased. Due to the bad pre-asymptotic behavior of current neural posterior estimators in the out-of-simulation regime, the resulting estimation biases cannot be fixed in acceptable time by just simulating more training data. In this proof-of-concept paper, we propose a semi-supervised approach that enables training not only on (labeled) simulated data generated from the model, but also on unlabeled data originating from any source, including real-world data. To achieve the latter, we exploit Bayesian self-consistency properties that can be transformed into strictly proper losses without requiring knowledge of true parameter values, that is, without requiring data labels. The results of our initial experiments show remarkable improvements in the robustness of ABI on out-of-simulation data. Even if the observed data is far away from both labeled and unlabeled training data, inference remains highly accurate. If our findings also generalize to other scenarios and model classes, we believe that our new method represents a major breakthrough in neural ABI.
- North America > United States (0.14)
- Europe > Germany (0.04)
STAN: Smooth Transition Autoregressive Networks
Traditional Smooth Transition Autoregressive (STAR) models offer an effective way to model these dynamics through smooth regime changes based on specific transition variables. In this paper, we propose a novel approach by drawing an analogy between STAR models and a multilayer neural network architecture. Our proposed neural network architecture mimics the STAR framework, employing multiple layers to simulate the smooth transition between regimes and capturing complex, nonlinear relationships. The network's hidden layers and activation functions are structured to replicate the gradual switching behavior typical of STAR models, allowing for a more flexible and scalable approach to regime-dependent modeling. This research suggests that neural networks can provide a powerful alternative to STAR models, with the potential to enhance predictive accuracy in economic and financial forecasting.
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Transformers Use Causal World Models in Maze-Solving Tasks
Spies, Alex F., Edwards, William, Ivanitskiy, Michael I., Skapars, Adrians, Räuker, Tilman, Inoue, Katsumi, Russo, Alessandra, Shanahan, Murray
Recent studies in interpretability have explored the inner workings of transformer models trained on tasks across various domains, often discovering that these networks naturally develop surprisingly structured representations. When such representations comprehensively reflect the task domain's structure, they are commonly referred to as ``World Models'' (WMs). In this work, we discover such WMs in transformers trained on maze tasks. In particular, by employing Sparse Autoencoders (SAEs) and analysing attention patterns, we examine the construction of WMs and demonstrate consistency between the circuit analysis and the SAE feature-based analysis. We intervene upon the isolated features to confirm their causal role and, in doing so, find asymmetries between certain types of interventions. Surprisingly, we find that models are able to reason with respect to a greater number of active features than they see during training, even if attempting to specify these in the input token sequence would lead the model to fail. Futhermore, we observe that varying positional encodings can alter how WMs are encoded in a model's residual stream. By analyzing the causal role of these WMs in a toy domain we hope to make progress toward an understanding of emergent structure in the representations acquired by Transformers, leading to the development of more interpretable and controllable AI systems.
Automatic Variational Inference in Stan
Variational inference is a scalable technique for approximate Bayesian inference. Deriving variational inference algorithms requires tedious model-specific calculations; this makes it difficult for non-experts to use. We propose an automatic variational inference algorithm, automatic differentiation variational inference (ADVI); we implement it in Stan (code available), a probabilistic programming system. In ADVI the user provides a Bayesian model and a dataset, nothing else. We make no conjugacy assumptions and support a broad class of models.
Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models
Saxena, Manan, Chen, Tinghua, Silverman, Justin D.
Many scientific fields collect longitudinal multivariate count data where the total number of counts is arbitrary (e.g., multinomial observations). These data are often called count compositional as the information in the data relates to the relative frequencies of the categories (Silverman et al., 2018). These data occur frequently in molecular biology (Espinoza et al., 2020), microbiome studies (Silverman et al., 2018; Joseph et al., 2020; Äijö et al., 2018), natural language processing (Linderman et al., 2015), biomedicine (Fokianos and Kedem, 2003), and social sciences (Cargnoni et al., 1997). Although the counting process used to collect these data is often modeled as multinomial, other sources of noise in the system being studied often lead to extra-multinomial variation. While some account for this extra-multinomial variability with multinomial-Dirichlet models (Mosimann, 1962), multinomial logistic-normal models are often superior, as they can account for both positive and negative covariation between multinomial categories (Aitchison and Shen, 1980; Cargnoni et al., 1997; Joseph et al., 2020; Silverman et al., 2018). Moreover, under suitable transformation (i.e., link function), the logistic-normal is multivariate Gaussian.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Explaining the (Not So) Obvious: Simple and Fast Explanation of STAN, a Next Point of Interest Recommendation System
Yunus, Fajrian, Abdessalem, Talel
A lot of effort in recent years have been expended to explain machine learning systems. However, some machine learning methods are inherently explainable, and thus are not completely black box. This enables the developers to make sense of the output without a developing a complex and expensive explainability technique. Besides that, explainability should be tailored to suit the context of the problem. In a recommendation system which relies on collaborative filtering, the recommendation is based on the behaviors of similar users, therefore the explanation should tell which other users are similar to the current user. Similarly, if the recommendation system is based on sequence prediction, the explanation should also tell which input timesteps are the most influential. We demonstrate this philosophy/paradigm in STAN (Spatio-Temporal Attention Network for Next Location Recommendation), a next Point of Interest recommendation system based on collaborative filtering and sequence prediction. We also show that the explanation helps to "debug" the output.
- Europe > France > Île-de-France (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
Amortized Bayesian Multilevel Models
Habermann, Daniel, Schmitt, Marvin, Kühmichel, Lars, Bulling, Andreas, Radev, Stefan T., Bürkner, Paul-Christian
Obtaining accurate inference and faithful uncertainty quantification in reasonable time is a frontier of today's statistical research (Cranmer et al., 2020). One major difficulty arising in most experimental and almost all observational data is the presence of complex dependency structures, for example, due to natural groupings (e.g., data gathered in different countries) or repeated measurements of the same observational units over time (e.g., particles, bacteria, or people; Gelman and Hill, 2006). To leverage these dependency structures, multilevel models (MLMs), also referred to as latent variable, hierarchical, random, or mixed effects models, have become an integral part of modern Bayesian statistics (Goldstein, 2011; Gelman et al., 2013; McGlothlin and Viele, 2018; Finch et al., 2019; Yao et al., 2022). Despite the wide success of Bayesian MLMs across the quantitative sciences, a major challenge is their limited efficiency and scalability when dealing with large and complex data. This is because estimating the full posterior distribution of all parameters of interest can be very costly (Gelman et al., 2013).
- North America > United States (0.28)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (14 more...)
- Transportation (0.95)
- Health & Medicine (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking
Feng, Wei, Wang, Feifan, Han, Ruize, Qian, Zekun, Wang, Song
Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self learning. In this work, we tackle this problem with a self-supervised learning aware end-to-end network. Specifically, we propose to take advantage of the spatial-temporal self-consistency rationale by considering three properties of reflexivity, symmetry and transitivity. Besides the reflexivity property that naturally holds, we design the self-supervised learning losses based on the properties of symmetry and transitivity, for both appearance feature learning and assignment matrix optimization, to associate the multiple humans over time and across views. Furthermore, to promote the research on MvMHAT, we build two new large-scale benchmarks for the network training and testing of different algorithms. Extensive experiments on the proposed benchmarks verify the effectiveness of our method. We have released the benchmark and code to the public.
- North America > United States > South Carolina > Richland County > Columbia (0.14)
- Asia > China > Tianjin Province > Tianjin (0.04)