AITopics

2303.13429

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Zhang, Yufeng, Zhang, Fengzhuo, Yang, Zhuoran, Wang, Zhaoran

What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

In this paper, we conduct a comprehensive study of In-Context Learning (ICL) by addressing several open questions: (a) What type of ICL estimator is learned by large language models? (b) What is a proper performance metric for ICL and what is the error rate? (c) How does the transformer architecture enable ICL? To answer these questions, we adopt a Bayesian view and formulate ICL as a problem of predicting the response corresponding to the current covariate, given a number of examples drawn from a latent variable model. To answer (a), we show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm, which is proven to be approximately parameterized by the attention mechanism. For (b), we analyze the ICL performance from an online learning perspective and establish a $\mathcal{O}(1/T)$ regret bound for perfectly pretrained ICL, where $T$ is the number of examples in the prompt. To answer (c), we show that, in addition to encoding Bayesian model averaging via attention, the transformer architecture also enables a fine-grained statistical analysis of pretraining under realistic assumptions. In particular, we prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error, where the former decays to zero exponentially as the depth grows, and the latter decays to zero sublinearly with the number of tokens in the pretraining dataset. Our results provide a unified understanding of the transformer and its ICL ability with bounds on ICL regret, approximation, and generalization, which deepens our knowledge of these essential aspects of modern language models.

arxiv preprint arxiv, probability, transformer, (13 more...)

2305.1942

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.84)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Schmitt, Marvin, Habermann, Daniel, Bürkner, Paul-Christian, Köthe, Ullrich, Radev, Stefan T.

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

arXiv.org Artificial IntelligenceOct-10-2023

We propose a method to improve the efficiency and accuracy of amortized Bayesian inference (ABI) by leveraging universal symmetries in the probabilistic joint model $p(\theta, y)$ of parameters $\theta$ and data $y$. In a nutshell, we invert Bayes' theorem and estimate the marginal likelihood based on approximate representations of the joint model. Upon perfect approximation, the marginal likelihood is constant across all parameter values by definition. However, approximation error leads to undesirable variance in the marginal likelihood estimates across different parameter values. We formulate violations of this symmetry as a loss function to accelerate the learning dynamics of conditional neural density estimators. We apply our method to a bimodal toy problem with an explicit likelihood (likelihood-based) and a realistic model with an implicit likelihood (simulation-based).

data-efficient amortized bayesian inference, leveraging self-consistency

2310.04395

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)

arXiv.org Artificial IntelligenceOct-10-2023

Interpretable Traffic Event Analysis with Bayesian Networks

Yuan, Tong, Yang, Jian, Wen, Zeyi

Although existing machine learning-based methods for traffic accident analysis can provide good quality results to downstream tasks, they lack interpretability which is crucial for this critical problem. This paper proposes an interpretable framework based on Bayesian Networks for traffic accident prediction. To enable the ease of interpretability, we design a dataset construction pipeline to feed the traffic data into the framework while retaining the essential traffic data information. With a concrete case study, our framework can derive a Bayesian Network from a dataset based on the causal relationships between weather and traffic events across the United States. Consequently, our framework enables the prediction of traffic accidents with competitive accuracy while examining how the probability of these events changes under different conditions, thus illustrating transparent relationships between traffic and weather events. Additionally, the visualization of the network simplifies the analysis of relationships between different variables, revealing the primary causes of traffic accidents and ultimately providing a valuable reference for reducing traffic accidents.

accident, bayesian network, probability, (13 more...)

2310.06713

Country:

North America > United States > California > Orange County > Irvine (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Transportation (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Xhemrishi, Marvin, Östman, Johan, Wachter-Zeh, Antonia, Amat, Alexandre Graell i

FedGT: Identification of Malicious Clients in Federated Learning with Secure Aggregation

arXiv.org Artificial IntelligenceOct-10-2023

We propose FedGT, a novel framework for identifying malicious clients in federated learning with secure aggregation. Inspired by group testing, the framework leverages overlapping groups of clients to identify the presence of malicious clients in the groups via a decoding operation. The clients identified as malicious are then removed from the training of the model, which is performed over the remaining clients. By choosing the size, number, and overlap between groups, FedGT strikes a balance between privacy and security. Specifically, the server learns the aggregated model of the clients in each group - vanilla federated learning and secure aggregation correspond to the extreme cases of FedGT with group size equal to one and the total number of clients, respectively. The effectiveness of FedGT is demonstrated through extensive experiments on the MNIST, CIFAR-10, and ISIC2019 datasets in a cross-silo setting under different data-poisoning attacks. These experiments showcase FedGT's ability to identify malicious clients, resulting in high model utility. We further show that FedGT significantly outperforms the private robust aggregation approach based on the geometric median recently proposed by Pillutla et al. on heterogeneous client data (ISIC2019) and in the presence of targeted attacks (CIFAR-10 and ISIC2019).

accuracy, fedgt, malicious client, (14 more...)

2305.05506

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Maryland > Baltimore (0.04)
(7 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Dermatology (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Papageorgiou, Ioannis, Kontoyiannis, Ioannis

The Bayesian Context Trees State Space Model for time series modelling and forecasting

A hierarchical Bayesian framework is introduced for developing rich mixture models for real-valued time series, partly motivated by important applications in financial time series analysis. At the top level, meaningful discrete states are identified as appropriately quantised values of some of the most recent samples. These observable states are described as a discrete context-tree model. At the bottom level, a different, arbitrary model for real-valued time series -- a base model -- is associated with each state. This defines a very general framework that can be used in conjunction with any existing model class to build flexible and interpretable mixture models. We call this the Bayesian Context Trees State Space Model, or the BCT-X framework. Efficient algorithms are introduced that allow for effective, exact Bayesian inference and learning in this setting; in particular, the maximum a posteriori probability (MAP) context-tree model can be identified. These algorithms can be updated sequentially, facilitating efficient online forecasting. The utility of the general framework is illustrated in two particular instances: When autoregressive (AR) models are used as base models, resulting in a nonlinear AR mixture model, and when conditional heteroscedastic (ARCH) models are used, resulting in a mixture model that offers a powerful and systematic way of modelling the well-known volatility asymmetries in financial data. In forecasting, the BCT-X methods are found to outperform state-of-the-art techniques on simulated and real-world data, both in terms of accuracy and computational requirements. In modelling, the BCT-X structure finds natural structure present in the data. In particular, the BCT-ARCH model reveals a novel, important feature of stock market index data, in the form of an enhanced leverage effect.

artificial intelligence, context-tree model, machine learning, (16 more...)

2308.00913

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(22 more...)

Genre: Research Report > Promising Solution (0.87)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Morin, Sacha, Legault, Robin, Laliberté, Félix, Bakk, Zsuzsa, Giguère, Charles-Édouard, de la Sablonnière, Roxane, Lacourse, Éric

StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables

StepMix is an open-source Python package for the pseudo-likelihood estimation (one-, two- and three-step approaches) of generalized finite mixture models (latent profile and latent class analysis) with external variables (covariates and distal outcomes). In many applications in social sciences, the main objective is not only to cluster individuals into latent classes, but also to use these classes to develop more complex statistical models. These models generally divide into a measurement model that relates the latent classes to observed indicators, and a structural model that relates covariates and outcome variables to the latent classes. The measurement and structural models can be estimated jointly using the so-called one-step approach or sequentially using stepwise methods, which present significant advantages for practitioners regarding the interpretability of the estimated latent classes. In addition to the one-step approach, StepMix implements the most important stepwise estimation methods from the literature, including the bias-adjusted three-step methods with Bolk-Croon-Hagenaars and maximum likelihood corrections and the more recent two-step approach. These pseudo-likelihood estimators are presented in this paper under a unified framework as specific expectation-maximization subroutines. To facilitate and promote their adoption among the data science community, StepMix follows the object-oriented design of the scikit-learn library and provides an additional R wrapper.

artificial intelligence, machine learning, stepmix, (19 more...)

2304.03853

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.45)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Karnesis, Nikolaos, Katz, Michael L., Korsakova, Natalia, Gair, Jonathan R., Stergioulas, Nikolaos

Eryn : A multi-purpose sampler for Bayesian inference

In recent years, methods for Bayesian inference have been widely used in many different problems in physics where detection and characterization are necessary. Data analysis in gravitational-wave astronomy is a prime example of such a case. Bayesian inference has been very successful because this technique provides a representation of the parameters as a posterior probability distribution, with uncertainties informed by the precision of the experimental measurements. During the last couple of decades, many specific advances have been proposed and employed in order to solve a large variety of different problems. In this work, we present a Markov Chain Monte Carlo (MCMC) algorithm that integrates many of those concepts into a single MCMC package. For this purpose, we have built {\tt Eryn}, a user-friendly and multipurpose toolbox for Bayesian inference, which can be utilized for solving parameter estimation and model selection problems, ranging from simple inference questions, to those with large-scale model variation requiring trans-dimensional MCMC methods, like the LISA global fit problem. In this paper, we describe this sampler package and illustrate its capabilities on a variety of use cases.

artificial intelligence, machine learning, proposal, (20 more...)

doi: 10.1093/mnras/stad2939

2303.02164

Country:

Europe > Greece > Central Macedonia > Thessaloniki (0.04)
Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Artificial IntelligenceOct-9-2023

Ensemble-based Hybrid Optimization of Bayesian Neural Networks and Traditional Machine Learning Algorithms

Tan, Peiwen

While hyperparameter tuning shows theoretical promise, its practical efficacy is not universally superior, as evidenced in Figure 4. The results thus offer a balanced perspective that marries theoretical rigor with empirical validation, fulfilling both academic and practical requirements. Implications for the Field of Machine Learning and Predictive Modeling Robustness and Generalization: The ensemble and stacking methods offer a mathematically substantiated pathway to improve the generalization capabilities of predictive models. Interpretability: The feature integration techniques not only improve model performance but also offer better interpretability by highlighting important features through mathematical formulations. Optimization: The proven convergence of Bayesian Optimization to the global optimum has far-reaching implications for hyperparameter tuning in models, as formalized by the EI equation. Unified Framework: This research provides a unified, mathematically rigorous framework for integrating Bayesian and non-Bayesian approaches, thereby setting a new benchmark for hybrid predictive systems. Future Research Directions Scalability: Investigating the scalability of the proposed methods, particularly in the context of the ensemble and Bayesian optimization equations, for larger datasets and more models. Real-world Applications: Extending this research to specific domains like healthcare, finance, and natural language processing to assess the practical utility of the proposed methods. Advanced Optimization Techniques: Exploring other optimization techniques that could further improve the efficiency and effectiveness of the proposed hybrid models, perhaps by introducing new mathematical formulations.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2310.05456

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.49)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

arXiv.org Artificial IntelligenceOct-9-2023

Adaptive Multi-head Contrastive Learning

Wang, Lei, Koniusz, Piotr, Gedeon, Tom, Zheng, Liang

In contrastive learning, two views of an original image generated by different augmentations are considered as a positive pair whose similarity is required to be high. Moreover, two views of two different images are considered as a negative pair, and their similarity is encouraged to be low. Normally, a single similarity measure given by a single projection head is used to evaluate positive and negative sample pairs, respectively. However, due to the various augmentation strategies and varying intra-sample similarity, augmented views from the same image are often not similar. Moreover, due to inter-sample similarity, augmented views of two different images may be more similar than augmented views from the same image. As such, enforcing a high similarity for positive pairs and a low similarity for negative pairs may not always be achievable, and in the case of some pairs, forcing so may be detrimental to the performance. To address this issue, we propose to use multiple projection heads, each producing a separate set of features. Our loss function for pre-training emerges from a solution to the maximum likelihood estimation over head-wise posterior distributions of positive samples given observations. The loss contains the similarity measure over positive and negative pairs, each re-weighted by an individual adaptive temperature that is regularized to prevent ill solutions. Our adaptive multi-head contrastive learning (AMCL) can be applied to and experimentally improves several popular contrastive learning methods such as SimCLR, MoCo and Barlow Twins. Such improvement is consistent under various backbones and linear probing epoches and is more significant when multiple augmentation methods are used.

learning, projection head, similarity, (15 more...)

2310.05615

Country:

North America > United States > New York (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)