AITopics

2404.19501

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

arXiv.org Artificial IntelligenceApr-30-2024

The Role of $n$-gram Smoothing in the Age of Neural Networks

Malagutti, Luca, Buinovskij, Andrius, Svete, Anej, Meister, Clara, Amini, Afra, Cotterell, Ryan

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an understatement to suggest that the line of inquiry into $n$-gram smoothing techniques became dormant. This paper re-opens the role classical $n$-gram smoothing techniques may play in the age of neural language models. First, we draw a formal equivalence between label smoothing, a popular regularization technique for neural language models, and add-$\lambda$ smoothing. Second, we derive a generalized framework for converting any $n$-gram smoothing technique into a regularizer compatible with neural language models. Our empirical results find that our novel regularizers are comparable to and, indeed, sometimes outperform label smoothing on language modeling and machine translation.

computational linguistic, language model, probability, (14 more...)

2403.1724

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > France (0.04)
(13 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Gangapuram, Harshini, Manian, Vidya

Bayesian Functional Connectivity and Graph Convolutional Network for Working Memory Load Classification

arXiv.org Artificial IntelligenceApr-30-2024

Brain responses related to working memory originate from distinct brain areas and oscillate at different frequencies. EEG signals with high temporal correlation can effectively capture these responses. Therefore, estimating the functional connectivity of EEG for working memory protocols in different frequency bands plays a significant role in analyzing the brain dynamics with increasing memory and cognitive loads, which remains largely unexplored. The present study introduces a Bayesian structure learning algorithm to learn the functional connectivity of EEG in sensor space. Next, the functional connectivity graphs are taken as input to the graph convolutional network to classify the working memory loads. The intrasubject (subject-specific) classification performed on 154 subjects for six different verbal working memory loads produced the highest classification accuracy of 96% and average classification accuracy of 89%, outperforming state-of-the-art classification models proposed in the literature. Furthermore, the proposed Bayesian structure learning algorithm is compared with state-of-the-art functional connectivity estimation methods through intersubject and intrasubject statistical analysis of variance. The results also show that the alpha and theta bands have better classification accuracy than the beta band.

classification, connectivity, functional connectivity, (14 more...)

2404.19467

Country:

North America > United States (0.04)
North America > Puerto Rico > Mayagüez > Mayagüez (0.04)
Asia > Middle East > Oman > Muscat Governorate > Muscat (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Haines, Nathaniel, Goold, Conor

BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking and Hierarchical Stacking in Python

arXiv.org Machine LearningApr-30-2024

Averaging predictions from multiple competing inferential models frequently outperforms predictions from any single model, providing that models are optimally weighted to maximize predictive performance. This is particularly the case in so-called $\mathcal{M}$-open settings where the true model is not in the set of candidate models, and may be neither mathematically reifiable nor known precisely. This practice of model averaging has a rich history in statistics and machine learning, and there are currently a number of methods to estimate the weights for constructing model-averaged predictive distributions. Nonetheless, there are few existing software packages that can estimate model weights from the full variety of methods available, and none that blend model predictions into a coherent predictive distribution according to the estimated weights. In this paper, we introduce the BayesBlend Python package, which provides a user-friendly programming interface to estimate weights and blend multiple (Bayesian) models' predictive distributions. BayesBlend implements pseudo-Bayesian model averaging, stacking and, uniquely, hierarchical Bayesian stacking to estimate model weights. We demonstrate the usage of BayesBlend with examples of insurance loss modeling.

accident year, bayesblend, candidate model, (14 more...)

2405.00158

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Abeer, A N M Nafiz, Jantre, Sanket, Urban, Nathan M, Yoon, Byung-Jun

Leveraging Active Subspaces to Capture Epistemic Model Uncertainty in Deep Generative Models for Molecular Design

arXiv.org Machine LearningApr-30-2024

Deep generative models have been accelerating the inverse design process in material and drug design. Unlike their counterpart property predictors in typical molecular design frameworks, generative molecular design models have seen fewer efforts on uncertainty quantification (UQ) due to computational challenges in Bayesian inference posed by their large number of parameters. In this work, we focus on the junction-tree variational autoencoder (JT-VAE), a popular model for generative molecular design, and address this issue by leveraging the low dimensional active subspace to capture the uncertainty in the model parameters. Specifically, we approximate the posterior distribution over the active subspace parameters to estimate the epistemic model uncertainty in an extremely high dimensional parameter space. The proposed UQ scheme does not require alteration of the model architecture, making it readily applicable to any pre-trained model. Our experiments demonstrate the efficacy of the AS-based UQ and its potential impact on molecular optimization by exploring the model diversity under epistemic uncertainty.

jt-vae, molecule, subspace, (17 more...)

2405.00202

Country: North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceApr-29-2024

Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints

Kong, Lingkai, Du, Yuanqi, Mu, Wenhao, Neklyudov, Kirill, De Bortoli, Valentin, Wang, Haorui, Wu, Dongxia, Ferber, Aaron, Ma, Yi-An, Gomes, Carla P., Zhang, Chao

Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailable. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in practice. To deal with such unknown constraints, we propose to perform optimization within the data manifold using diffusion models. To constrain the optimization process to the data manifold, we reformulate the original optimization problem as a sampling problem from the product of the Boltzmann distribution defined by the objective function and the data distribution learned by the diffusion model. To enhance sampling efficiency, we propose a two-stage framework that begins with a guided diffusion process for warm-up, followed by a Langevin dynamics stage for further correction. Theoretical analysis shows that the initial stage results in a distribution focused on feasible solutions, thereby providing a better initialization for the later stage. Comprehensive experiments on a synthetic dataset, six real-world black-box optimization datasets, and a multi-objective optimization dataset show that our method achieves better or comparable performance with previous state-of-the-art baselines.

constrained sampler, constraint, optimization, (12 more...)

2402.18012

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Volya, Daniel, Nikitin, Andrey, Mishra, Prabhat

Fast Quantum Process Tomography via Riemannian Gradient Descent

arXiv.org Artificial IntelligenceApr-29-2024

Constrained optimization plays a crucial role in the fields of quantum physics and quantum information science and becomes especially challenging for high-dimensional complex structure problems. One specific issue is that of quantum process tomography, in which the goal is to retrieve the underlying quantum process based on a given set of measurement data. In this paper, we introduce a modified version of stochastic gradient descent on a Riemannian manifold that integrates recent advancements in numerical methods for Riemannian optimization. This approach inherently supports the physically driven constraints of a quantum process, takes advantage of state-of-the-art large-scale stochastic objective optimization, and has superior performance to traditional approaches such as maximum likelihood estimation and projected least squares. The data-driven approach enables accurate, order-of-magnitude faster results, and works with incomplete data. We demonstrate our approach on simulations of quantum processes and in hardware by characterizing an engineered process on quantum computers.

manifold, optimization, tomography, (12 more...)

2404.1884

Country: North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Pillai, Nisha, Nanduri, Bindu, Rothrock, Michael J Jr., Chen, Zhiqian, Ramkumar, Mahalingam

Bayesian-Guided Generation of Synthetic Microbiomes with Minimized Pathogenicity

arXiv.org Artificial IntelligenceApr-29-2024

Synthetic microbiomes offer new possibilities for modulating microbiota, to address the barriers in multidtug resistance (MDR) research. We present a Bayesian optimization approach to enable efficient searching over the space of synthetic microbiome variants to identify candidates predictive of reduced MDR. Microbiome datasets were encoded into a low-dimensional latent space using autoencoders. Sampling from this space allowed generation of synthetic microbiome signatures. Bayesian optimization was then implemented to select variants for biological screening to maximize identification of designs with restricted MDR pathogens based on minimal samples. Four acquisition functions were evaluated: expected improvement, upper confidence bound, Thompson sampling, and probability of improvement. Based on each strategy, synthetic samples were prioritized according to their MDR detection. Expected improvement, upper confidence bound, and probability of improvement consistently produced synthetic microbiome candidates with significantly fewer searches than Thompson sampling. By combining deep latent space mapping and Bayesian learning for efficient guided screening, this study demonstrated the feasibility of creating bespoke synthetic microbiomes with customized MDR profiles.

autoencoder, microbiome sample, resistance, (15 more...)

2405.0007

Country:

North America > United States > Mississippi (0.05)
North America > Mexico (0.04)
Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.04)

Genre: Research Report > New Finding (0.89)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

arXiv.org Machine LearningApr-29-2024

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Antoran, Javier

Large neural networks trained on large datasets have become the dominant paradigm in machine learning. These systems rely on maximum likelihood point estimates of their parameters, precluding them from expressing model uncertainty. This may result in overconfident predictions and it prevents the use of deep learning models for sequential decision making. This thesis develops scalable methods to equip neural networks with model uncertainty. In particular, we leverage the linearised Laplace approximation to equip pre-trained neural networks with the uncertainty estimates provided by their tangent linear models. This turns the problem of Bayesian inference in neural networks into one of Bayesian inference in conjugate Gaussian-linear models. Alas, the cost of this remains cubic in either the number of network parameters or in the number of observations times output dimensions. By assumption, neither are tractable. We address this intractability by using stochastic gradient descent (SGD) -- the workhorse algorithm of deep learning -- to perform posterior sampling in linear models and their convex duals: Gaussian processes. With this, we turn back to linearised neural networks, finding the linearised Laplace approximation to present a number of incompatibilities with modern deep learning practices -- namely, stochastic optimisation, early stopping and normalisation layers -- when used for hyperparameter learning. We resolve these and construct a sample-based EM algorithm for scalable hyperparameter learning with linearised neural networks. We apply the above methods to perform linearised neural network inference with ResNet-50 (25M parameters) trained on Imagenet (1.2M observations and 1000 output dimensions). Additionally, we apply our methods to estimate uncertainty for 3d tomographic reconstructions obtained with the deep image prior network.

estimation and experimental design, information processing system, linearised dip uncertainty estimation, (16 more...)

2404.19157

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.13)
Asia > Middle East > Jordan (0.04)
(8 more...)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.92)
Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Chen, Sitan, Kontonis, Vasilis, Shah, Kulin

Learning general Gaussian mixtures with efficient score matching

arXiv.org Machine LearningApr-29-2024

We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions. We make no separation assumptions on the underlying mixture components: we only require that the covariance matrices have bounded condition number and that the means and covariances lie in a ball of bounded radius. We give an algorithm that draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler whose output distribution is $\varepsilon$-far from the unknown mixture in total variation. Prior works for this problem either (i) required exponential runtime in the dimension $d$, (ii) placed strong assumptions on the instance (e.g., spherical covariances or clusterability), or (iii) had doubly exponential dependence on the number of components $k$. Our approach departs from commonly used techniques for this problem like the method of moments. Instead, we leverage a recently developed reduction, based on diffusion models, from distribution learning to a supervised learning task called score matching. We give an algorithm for the latter by proving a structural result showing that the score function of a Gaussian mixture can be approximated by a piecewise-polynomial function, and there is an efficient algorithm for finding it. To our knowledge, this is the first example of diffusion models achieving a state-of-the-art theoretical guarantee for an unsupervised learning task.

algorithm, approximation, score function, (14 more...)

2404.18893

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)