AITopics

2411.15624

Country:

North America > United States > California (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Hematology (0.92)
Health & Medicine > Therapeutic Area > Oncology (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Ziomek, Juliusz, Adachi, Masaki, Osborne, Michael A.

Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal

arXiv.org Machine LearningNov-22-2024

Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence. To overcome this, we introduce Length scale Balancing (LB) - a novel approach, aggregating multiple base surrogate models with varying length scales. LB intermittently adds smaller length scale candidate values while retaining longer scales, balancing exploration and exploitation. We formally derive a cumulative regret bound of LB and compare it with the regret of an oracle BO algorithm using the optimal length scale. Denoting the factor by which the regret bound of A-GP-UCB was away from oracle as $g(T)$, we show that LB is only $\log g(T)$ away from oracle regret. We also empirically evaluate our algorithm on synthetic and real-world benchmarks and show it outperforms A-GP-UCB, maximum likelihood estimation and MCMC.

artificial intelligence, data mining, machine learning, (20 more...)

2410.10384

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.99)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Li, Ziyao, Hsu, Shang-Ling, Shahabi, Cyrus

Forecasting Unseen Points of Interest Visits Using Context and Proximity Priors

arXiv.org Artificial IntelligenceNov-22-2024

Understanding human mobility behavior is crucial for numerous applications, including crowd management, location-based recommendations, and the estimation of pandemic spread. Machine learning models can predict the Points of Interest (POIs) that individuals are likely to visit in the future by analyzing their historical visit patterns. Previous studies address this problem by learning a POI classifier, where each class corresponds to a POI. However, this limits their applicability to predict a new POI that was not in the training data, such as the opening of new restaurants. To address this challenge, we propose a model designed to predict a new POI outside the training data as long as its context is aligned with the user's interests. Unlike existing approaches that directly predict specific POIs, our model first forecasts the semantic context of potential future POIs, then combines this with a proximity-based prior probability distribution to determine the exact POI. Experimental results on real-world visit data demonstrate that our model outperforms baseline methods that do not account for semantic contexts, achieving a 17% improvement in accuracy. Notably, as new POIs are introduced over time, our model remains robust, exhibiting a lower decline rate in prediction accuracy compared to existing methods.

poi, pois, sequence, (16 more...)

2411.15285

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Restaurants (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Banerjee, Saikat, Carbonetto, Peter, Stephens, Matthew

Gradient-based optimization for variational empirical Bayes multiple regression

Multiple linear regression provides a simple, but widely used, method to find associations between outcomes (responses) and a set of predictors (explanatory variables). It has been actively studied over more than a century, and there is a rich and vast literature on the subject [1]. In practical situations the number of predictor variables is often large, and it becomes desirable to induce sparsity in the regression coefficients to avoid overfitting [2, 3]. Sparse linear regression also serves as the foundation for non-linear techniques, such as trendfiltering [4, 5], which can estimate an underlying non-linear trend from time series data. Applications of sparse multiple linear regression and trendfiltering arise in a wide range of applications in modern science and engineering, including astronomy [6], atmospheric sciences [7], biology [8], economics [9, 10], genetics [11-15], geophysics [16], medical sciences [17, 18], social sciences [19] and text analysis [20]. Approaches to sparse linear regression can be broadly classified into two groups: (a) penalized linear regressions (PLR), which add a penalty term to the likelihood to penalize the magnitude of its parameters [21-23], and (b) Bayesian approaches [11-14, 24-29], which use a prior probability distribution on the model parameters to induce sparsity.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2411.1457

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Nautiyal, Mayank, Shternshis, Andrey, Hellander, Andreas, Singh, Prashant

Variational Autoencoders for Efficient Simulation-Based Inference

arXiv.org Artificial IntelligenceNov-21-2024

We present a generative modeling approach based on the variational inference framework for likelihood-free simulation-based inference. The method leverages latent variables within variational autoencoders to efficiently estimate complex posterior distributions arising from stochastic simulations. We explore two variations of this approach distinguished by their treatment of the prior distribution. The first model adapts the prior based on observed data using a multivariate prior network, enhancing generalization across various posterior queries. In contrast, the second model utilizes a standard Gaussian prior, offering simplicity while still effectively capturing complex posterior distributions. We demonstrate the efficacy of these models on well-established benchmark problems, achieving results comparable to flow-based approaches while maintaining computational efficiency and scalability.

inference, posterior distribution, test problem, (13 more...)

2411.14511

Country:

North America > United States (0.14)
Europe > Sweden > Uppsala County > Uppsala (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Palit, Sanchar, Banerjee, Biplab, Chaudhuri, Subhasis

Revised Regularization for Efficient Continual Learning through Correlation-Based Parameter Update in Bayesian Neural Networks

arXiv.org Artificial IntelligenceNov-21-2024

We propose a Bayesian neural network-based continual learning algorithm using Variational Inference, aiming to overcome several drawbacks of existing methods. Specifically, in continual learning scenarios, storing network parameters at each step to retain knowledge poses challenges. This is compounded by the crucial need to mitigate catastrophic forgetting, particularly given the limited access to past datasets, which complicates maintaining correspondence between network parameters and datasets across all sessions. Current methods using Variational Inference with KL divergence risk catastrophic forgetting during uncertain node updates and coupled disruptions in certain nodes. To address these challenges, we propose the following strategies. To reduce the storage of the dense layer parameters, we propose a parameter distribution learning method that significantly reduces the storage requirements. In the continual learning framework employing variational inference, our study introduces a regularization term that specifically targets the dynamics and population of the mean and variance of the parameters. This term aims to retain the benefits of KL divergence while addressing related challenges. To ensure proper correspondence between network parameters and the data, our method introduces an importance-weighted Evidence Lower Bound term to capture data and parameter correlations. This enables storage of common and distinctive parameter hyperspace bases. The proposed method partitions the parameter space into common and distinctive subspaces, with conditions for effective backward and forward knowledge transfer, elucidating the network-parameter dataset correspondence. The experimental results demonstrate the effectiveness of our method across diverse datasets and various combinations of sequential datasets, yielding superior performance compared to existing approaches.

continual learning, learning, neural network, (10 more...)

2411.14202

Country:

Asia > India > Karnataka > Bengaluru (0.05)
Asia > India > Maharashtra > Mumbai (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

arXiv.org Artificial IntelligenceNov-21-2024

Structure-Based Molecule Optimization via Gradient-Guided Bayesian Update

Qiu, Keyue, Song, Yuxuan, Yu, Jie, Ma, Hongbo, Cao, Ziyao, Zhang, Zhilong, Wu, Yushuai, Zheng, Mingyue, Zhou, Hao, Ma, Wei-Ying

Structure-based molecule optimization (SBMO) aims to optimize molecules with both continuous coordinates and discrete types against protein targets. A promising direction is to exert gradient guidance on generative models given its remarkable success in images, but it is challenging to guide discrete data and risks inconsistencies between modalities. To this end, we leverage a continuous and differentiable space derived through Bayesian inference, presenting Molecule Joint Optimization (MolJO), the first gradient-based SBMO framework that facilitates joint guidance signals across different modalities while preserving SE(3)-equivariance. We introduce a novel backward correction strategy that optimizes within a sliding window of the past histories, allowing for a seamless trade-off between explore-and-exploit during optimization. Our proposed MolJO achieves state-of-the-art performance on CrossDocked2020 benchmark (Success Rate 51.3% , Vina Dock -9.05 and SA 0.78), more than 4x improvement in Success Rate compared to the gradient-based counterpart, and 2x "Me-Better" Ratio as much as 3D baselines. Furthermore, we extend MolJO to a wide range of optimization settings, including multi-objective optimization and challenging tasks in drug design such as R-group optimization and scaffold hopping, further underscoring its versatility and potential.

guidance, molecule, optimization, (17 more...)

2411.1328

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Caballero, William N., LaRosa, Matthew, Fisher, Alexander, Tarokh, Vahid

Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians

The multivariate Gaussian distribution underpins myriad operations-research, decision-analytic, and machine-learning models (e.g., Bayesian optimization, Gaussian influence diagrams, and variational autoencoders). However, despite recent advances in adversarial machine learning (AML), inference for Gaussian models in the presence of an adversary is notably understudied. Therefore, we consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables. To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence. We consider white- and grey-box settings such that the attacker has complete and incomplete knowledge about the decisionmaker's underlying multivariate Gaussian distribution, respectively. Select instances are shown to reduce to quadratic and stochastic quadratic programs, and structural properties are derived to inform solution methods. We assess the impact and efficacy of these attacks in three examples, including, real estate evaluation, interest rate estimation and signals processing. Each example leverages an alternative underlying model, thereby highlighting the attacks' broad applicability. Through these applications, we also juxtapose the behavior of the white- and grey-box attacks to understand how uncertainty and structure affect attacker behavior.

attacker, conditional distribution, decisionmaker, (14 more...)

2411.14351

Country:

North America > United States > Arizona (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Campania > Naples (0.04)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.93)
Banking & Finance > Real Estate (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Riccius, Leon, Rocha, Iuri B. C. M., Bierkens, Joris, Kekkonen, Hanne, van der Meer, Frans P.

Integration of Active Learning and MCMC Sampling for Efficient Bayesian Calibration of Mechanical Properties

Recent advancements in Markov chain Monte Carlo (MCMC) sampling and surrogate modelling have significantly enhanced the feasibility of Bayesian analysis across engineering fields. However, the selection and integration of surrogate models and cutting-edge MCMC algorithms, often depend on ad-hoc decisions. A systematic assessment of their combined influence on analytical accuracy and efficiency is notably lacking. The present work offers a comprehensive comparative study, employing a scalable case study in computational mechanics focused on the inference of spatially varying material parameters, that sheds light on the impact of methodological choices for surrogate modelling and sampling. We show that a priori training of the surrogate model introduces large errors in the posterior estimation even in low to moderate dimensions. We introduce a simple active learning strategy based on the path of the MCMC algorithm that is superior to all a priori trained models, and determine its training data requirements. We demonstrate that the choice of the MCMC algorithm has only a small influence on the amount of training data but no significant influence on the accuracy of the resulting surrogate model. Further, we show that the accuracy of the posterior estimation largely depends on the surrogate model, but not even a tailored surrogate guarantees convergence of the MCMC.Finally, we identify the forward model as the bottleneck in the inference process, not the MCMC algorithm. While related works focus on employing advanced MCMC algorithms, we demonstrate that the training data requirements render the surrogate modelling approach infeasible before the benefits of these gradient-based MCMC algorithms on cheap models can be reaped.

algorithm, surrogate model, training data, (14 more...)

2411.13361

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Materials (0.50)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Lai, Jinlin, Domke, Justin, Sheldon, Daniel

Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models

Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less efficient. In particular, marginalizing some variables can greatly improve inference but is difficult for users to do manually. We develop an algorithm to easily marginalize random effects in LMMs. A naive approach introduces cubic time operations within an inference algorithm like HMC, but we reduce the running time to linear using fast linear algebra techniques. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.

marginalization, numpyro, random effect, (15 more...)

2410.24079

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > North Carolina (0.04)
(5 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)