AITopics | variational learning

Collaborating Authors

variational learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Variational Learning Finds Flatter Solutions at the Edge of Stability

Neural Information Processing SystemsJun-12-2026, 15:35:13 GMT

Variational Learning (VL) has recently gained popularity for training deep neural networks. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but little has been done to unravel the implicit regularization in play. Here, we analyze the implicit regularization of VL through the Edge of Stability (EoS) framework. EoS has previously been used to show that gradient descent can find flat solutions and we extend this result to show that VL can find even flatter solutions. This result is obtained by controlling the shape of the variational posterior as well as the number of posterior samples used during training. The derivation follows in a similar fashion as in the standard EoS literature for deep learning, by first deriving a result for a quadratic problem and then extending it to deep neural networks. We empirically validate these findings on a wide variety of large networks, such as ResNet and ViT, to find that the theoretical results closely match the empirical ones. Ours is the first work to analyze the EoS dynamics of~VL.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Variational Learning on Aggregate Outputs with Gaussian Processes

Neural Information Processing SystemsNov-20-2025, 21:56:16 GMT

While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.

aggregate output, gaussian process, variational learning, (2 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Reviews: Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

Neural Information Processing SystemsJan-20-2025, 11:07:07 GMT

Technical quality: The technical contribution of the paper is defined at the level of a framework with modular parts and so is quite high-level as a result. The main components are the generative model (Eq 1), the recurrent inference network, and the use of variational learning. To the extent that technical details are provided for these components, they are correct. The bulk of the paper focuses on the construction of experiments and the analysis of the results. In general, the data sets and tasks are well designed in both the 2D and 3D cases.

generative model, inference network, recurrent inference network, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.64)
Information Technology > Artificial Intelligence > Vision (0.53)

Add feedback

Variational Learning for Recurrent Spiking Networks

Neural Information Processing SystemsMar-14-2024, 21:47:42 GMT

We derive a plausible learning rule for feedforward, feedback and lateral connections in a recurrent network of spiking neurons. Operating in the context of a generative model for distributions of spike sequences, the learning mechanism is derived from variational inference principles. The synaptic plasticity rules found are interesting in that they are strongly reminiscent of experimental Spike Time Dependent Plasticity, and in that they differ for excitatory and inhibitory neurons. A simulation confirms the method's applicability to learning both stationary and temporal spike patterns.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Switzerland (0.15)
Europe > United Kingdom > England (0.14)

Industry: Energy > Oil & Gas (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Variational Learning is Effective for Large Deep Networks

Shen, Yuesong, Daheim, Nico, Cong, Bai, Nickl, Peter, Marconi, Gian Maria, Bazan, Clement, Yokota, Rio, Gurevych, Iryna, Cremers, Daniel, Khan, Mohammad Emtiyaz, Möllenhoff, Thomas

arXiv.org Machine LearningFeb-27-2024

Laplace (MacKay, 1992), which do not directly optimize the variational objective, even though they have variational We give extensive empirical evidence against the interpretations. Ideally, we want to know whether a direct common belief that variational learning is ineffective optimization of the objective can match the accuracy of for large neural networks. We show that Adam-like methods without any increase in the cost, while an optimizer called Improved Variational Online also yielding good weight-uncertainty to improve calibration, Newton (IVON) consistently matches or outperforms model averaging, knowledge transfer, etc. Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational In this paper, we present the Improved Variational Online costs are nearly identical to Adam but Newton (IVON) method, which adapts the method of Lin its predictive uncertainty is better. We show several et al. (2020) to large scale and obtains state-of-the-art accuracy new use cases of IVON where we improve and uncertainty at nearly identical cost as Adam. Figure 1 fine-tuning and model merging in Large Language shows some examples where, for training GPT-2 (773M Models, accurately predict generalization error, parameters) from scratch, IVON gives 0.4 reduction in validation and faithfully estimate sensitivity to data. We find perplexity over AdamW and, for ResNet-50 (25.6M overwhelming evidence in support of effectiveness parameters) on ImageNet, it gives around 2% more accurate of variational learning.

ivon, learning, variational learning, (15 more...)

arXiv.org Machine Learning

2402.17641

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Toward Automated Quantum Variational Machine Learning

Subasi, Omer

arXiv.org Artificial IntelligenceDec-3-2023

In this work, we address the problem of automating quantum variational machine learning. We develop a multi-locality parallelizable search algorithm, called MUSE, to find the initial points and the sets of parameters that achieve the best performance for quantum variational circuit learning. Simulations with five real-world classification datasets indicate that on average, MUSE improves the detection accuracy of quantum variational classifiers 2.3 times with respect to the observed lowest scores. Moreover, when applied to two real-world regression datasets, MUSE improves the quality of the predictions from negative coefficients of determination to positive ones. Furthermore, the classification and regression scores of the quantum variational models trained with MUSE are on par with the classical counterparts.

classifier, dataset, muse, (16 more...)

arXiv.org Artificial Intelligence

2312.01567

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Washington > Benton County > Richland (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference

Zhao, Yonghe, Huang, Qiang, Wu, Siwei, Peng, Yun, Sun, Huiyan

arXiv.org Artificial IntelligenceSep-7-2023

Causal inference plays a vital role in diverse domains like epidemiology, healthcare, and economics. De-confounding and counterfactual prediction in observational data has emerged as a prominent concern in causal inference research. While existing models tackle observed confounders, the presence of unobserved confounders remains a significant challenge, distorting causal inference and impacting counterfactual outcome accuracy. To address this, we propose a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI), which generates the posterior distribution of unobserved confounders. VLUCI relaxes the unconfoundedness assumption often overlooked by most causal inference methods. By disentangling observed and unobserved confounders, VLUCI constructs a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes. Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI's superior performance in inferring unobserved confounders. It is compatible with state-of-the-art counterfactual inference models, significantly improving inference accuracy at both group and individual levels. Additionally, VLUCI provides confidence intervals for counterfactual outcomes, aiding decision-making in risk-sensitive domains. We further clarify the considerations when applying VLUCI to cases where unobserved confounders don't strictly conform to our model assumptions using the public IHDP dataset as an example, highlighting the practical advantages of VLUCI.

causal effect, inference, unobserved confounder, (13 more...)

arXiv.org Artificial Intelligence

2308.00904

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Jilin Province (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Variational Learning for Recurrent Spiking Networks

Neural Information Processing SystemsApr-6-2023, 12:46:04 GMT

We derive a plausible learning rule updating the synaptic efficacies for feedforward, feedback and lateral connections between observed and latent neurons. Operating in the context of a generative model for distributions of spike sequences, the learning mechanism is derived from variational inference principles. The synaptic plasticity rules found are interesting in that they are strongly reminiscent of experimentally found results on Spike Time Dependent Plasticity, and in that they differ for excitatory and inhibitory neurons. A simulation confirms the method's applicability to learning both stationary and temporal spike patterns.

neuron, recurrent spiking network, variational learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

Liu, Hong, Cai, Yucheng, Lin, Zhenru, Ou, Zhijian, Huang, Yi, Feng, Junlan

arXiv.org Artificial IntelligenceJan-26-2023

Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the generative model and the inference model for variational learning of the end-to-end TOD system, both as auto-regressive language models based on GPT-2, which can be further trained over a mix of labeled and unlabeled dialog data in a semi-supervised manner. Variational training of VLS-GPT is both statistically and computationally more challenging than previous variational learning works for sequential latent variable models, which use turn-level first-order Markovian. The inference model in VLS-GPT is non-Markovian due to the use of the Transformer architecture. In this work, we establish Recursive Monte Carlo Approximation (RMCA) to the variational objective with non-Markovian inference model and prove its unbiasedness. Further, we develop the computational strategy of sampling-then-forward-computation to realize RMCA, which successfully overcomes the memory explosion issue of using GPT in variational learning and speeds up training. Semi-supervised TOD experiments are conducted on two benchmark multi-domain datasets of different languages - MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised self-training baselines.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2109.04314

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.46)
Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Variational Learning of Individual Survival Distributions

Xiu, Zidi, Tao, Chenyang, Henao, Ricardo

arXiv.org Machine LearningMar-9-2020

The abundance of modern health data provides many opportunities for the use of machine learning techniques to build better statistical models to improve clinical decision making. Predicting time-to-event distributions, also known as survival analysis, plays a key role in many clinical applications. We introduce a variational time-to-event prediction model, named Variational Survival Inference (VSI), which builds upon recent advances in distribution learning techniques and deep neural networks. VSI addresses the challenges of non-parametric distribution estimation by ($i$) relaxing the restrictive modeling assumptions made in classical models, and ($ii$) efficiently handling the censored observations, {\it i.e.}, events that occur outside the observation window, all within the variational framework. To validate the effectiveness of our approach, an extensive set of experiments on both synthetic and real-world datasets is carried out, showing improved performance relative to competing solutions.

dataset, survival analysis, vsi, (14 more...)

arXiv.org Machine Learning

doi: 10.1145/3368555.3384454

2003.0443

Country:

North America > Canada > Ontario > Toronto (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback