AITopics

doi: 10.1613/jair.1.13643

AI Access Foundation

13643

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.75)
Health & Medicine > Therapeutic Area (0.53)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJul-6-2022

Variational Flow Graphical Model

Ren, Shaogang, Karimi, Belhal, Li, Dingcheng, Li, Ping

This paper introduces a novel approach to embed flow-based models with hierarchical structures. The proposed framework is named Variational Flow Graphical (VFG) Model. VFGs learn the representation of high dimensional data via a message-passing scheme by integrating flow-based functions through variational inference. By leveraging the expressive power of neural networks, VFGs produce a representation of the data using a lower dimension, thus overcoming the drawbacks of many flow-based models, usually requiring a high dimensional latent space involving many trivial variables. Aggregation nodes are introduced in the VFG models to integrate forward-backward hierarchical information via a message passing scheme. Maximizing the evidence lower bound (ELBO) of data likelihood aligns the forward and backward messages in each aggregation node achieving a consistency node state. Algorithms have been developed to learn model parameters through gradient updating regarding the ELBO objective. The consistency of aggregation nodes enable VFGs to be applicable in tractable inference on graphical structures. Besides representation learning and numerical inference, VFGs provide a new approach for distribution modeling on datasets with graphical latent structures. Additionally, theoretical study shows that VFGs are universal approximators by leveraging the implicitly invertible flow-based structures. With flexible graphical structures and superior excessive power, VFGs could potentially be used to improve probabilistic inference. In the experiments, VFGs achieves improved evidence lower bound (ELBO) and likelihood values on multiple datasets.

artificial intelligence, machine learning, node, (19 more...)

2207.02722

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.04)
(21 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)

Vaitl, Lorenz, Nicoli, Kim A., Nakajima, Shinichi, Kessel, Pan

Path-Gradient Estimators for Continuous Normalizing Flows

arXiv.org Machine LearningJun-17-2022

Recent work has established a path-gradient estimator for simple variational Gaussian distributions and has argued that the path-gradient is particularly beneficial in the regime in which the variational distribution approaches the exact target distribution. In many applications, this regime can however not be reached by a simple Gaussian variational distribution. In this work, we overcome this crucial limitation by proposing a path-gradient estimator for the considerably more expressive variational family of continuous normalizing flows. We outline an efficient algorithm to calculate this estimator and establish its superior performance empirically.

artificial intelligence, estimator, machine learning, (14 more...)

2206.09016

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Germany > Berlin (0.04)
North America > United States > Maryland > Baltimore (0.04)
(9 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Artificial IntelligenceJun-16-2022

Classification of datasets with imputed missing values: does imputation quality matter?

Shadbahr, Tolou, Roberts, Michael, Stanczuk, Jan, Gilbey, Julian, Teare, Philip, Dittmer, Sören, Thorpe, Matthew, Torne, Ramon Vinas, Sala, Evis, Lio, Pietro, Patel, Mishal, Collaboration, AIX-COVNET, Rudd, James H. F., Mirtti, Tuomas, Rannikko, Antti, Aston, John A. D., Tang, Jing, Schönlieb, Carola-Bibiane

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete, imputed, samples. The focus of the machine learning researcher is then to optimise the downstream classification performance. In this study, we highlight that it is imperative to consider the quality of the imputation. We demonstrate how the commonly used measures for assessing quality are flawed and propose a new class of discrepancy scores which focus on how well the method recreates the overall distribution of the data. To conclude, we highlight the compromised interpretability of classifier models trained using poorly imputed data. All code and data used in this paper are also released publicly at [inserted upon publication].

artificial intelligence, dataset, machine learning, (16 more...)

doi: 10.1038/s43856-023-00356-z

2206.08478

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Austria > Vienna (0.14)
Europe > Germany > Bremen > Bremen (0.14)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Providers & Services (0.93)
Health & Medicine > Diagnostic Medicine > Imaging (0.92)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Hacohen, Guy, Weinshall, Daphna

Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks

arXiv.org Artificial IntelligenceJun-8-2022

Recent work suggests that convolutional neural networks of different architectures learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our analysis reveals that, when the hidden layers are wide enough, the convergence rate of this model's parameters is exponentially faster along the directions of the larger principal components of the data, at a rate governed by the corresponding singular values. We term this convergence pattern the Principal Components bias (PC-bias). Empirically, we show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently at earlier stages of learning. We then compare our results to the simplicity bias, showing that both biases can be seen independently, and affect the order of learning in different ways. Finally, we discuss how the PC-bias may explain some benefits of early stopping and its connection to PCA, and why deep networks converge more slowly with random labels.

artificial intelligence, linear network, machine learning, (18 more...)

2105.05553

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lee, Sebastian, Mannelli, Stefano Sarao, Clopath, Claudia, Goldt, Sebastian, Saxe, Andrew

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

arXiv.org Artificial IntelligenceMay-18-2022

Continual learning - learning new tasks in sequence while maintaining performance on old tasks - remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow's hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.

hypothesis, maslow, similarity, (15 more...)

2205.09029

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Ontario > Toronto (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.93)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

arXiv.org Machine LearningMay-13-2022

A Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model

Xie, Jianwen, Zhu, Yaxuan, Li, Jun, Li, Ping

This paper studies the cooperative learning of two generative flow models, in which the two models are iteratively updated based on the jointly synthesized examples. The first flow model is a normalizing flow that transforms an initial simple density into a target density by applying a sequence of invertible transformations. The second flow model is a Langevin flow that runs finite steps of gradient-based MCMC toward an energy-based model. We start from proposing a generative framework that trains an energy-based model with a normalizing flow as an amortized sampler to initialize the MCMC chains of the energy-based model. In each learning iteration, we generate synthesized examples by using a normalizing flow initialization followed by a short-run Langevin flow revision toward the current energy-based model. Then we treat the synthesized examples as fair samples from the energy-based model and update the model parameters with the maximum likelihood learning gradient, while the normalizing flow directly learns from the synthesized examples by maximizing the tractable likelihood. Under the short-run non-mixing MCMC scenario, the estimation of the energy-based model is shown to follow the perturbation of maximum likelihood, and the short-run Langevin flow and the normalizing flow form a two-flow generator that we call CoopFlow. We provide an understating of the CoopFlow algorithm by information geometry and show that it is a valid generator as it converges to a moment matching estimator. We demonstrate that the trained CoopFlow is capable of synthesizing realistic images, reconstructing images, and interpolating between images. Normalizing flows (Dinh et al., 2015; 2017; Kingma & Dhariwal, 2018) are a family of generative models that construct a complex distribution by transforming a simple probability density, such as Gaussian distribution, through a sequence of invertible and differentiable mappings. Due to the tractability of the exact log-likelihood and the efficiency of the inference and synthesis, normalizing flows have gained popularity in density estimation (Kingma & Dhariwal, 2018; Ho et al., 2019; Yang et al., 2019; Prenger et al., 2019; Kumar et al., 2020) and variational inference (Rezende & Mohamed, 2015; Kingma et al., 2016).

artificial intelligence, langevin flow, machine learning, (20 more...)

2205.06924

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(24 more...)

Genre: Research Report (1.00)

Journal of Artificial Intelligence ResearchMay-5-2022

Multi-Agent Advisor Q-Learning

Ganapathi Subramanian, Sriram (U Waterloo) | Taylor, Matthew E. (University of Alberta) | Larson, Kate (University of Waterloo) | Crowley, Mark (University of Waterloo)

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.

admiral-dm, agent, algorithm, (17 more...)

doi: 10.1613/jair.1.13445

AI Access Foundation

13445

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Colorado > Denver County > Denver (0.14)
(51 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Gordon-Rodriguez, Elliott, Loaiza-Ganem, Gabriel, Potapczynski, Andres, Cunningham, John P.

On the Normalizing Constant of the Continuous Categorical Distribution

arXiv.org Machine LearningApr-28-2022

Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution.

algorithm 1, artificial intelligence, machine learning, (16 more...)

2204.1329

Country:

North America > United States > New York (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.66)

arXiv.org Machine LearningApr-22-2022

A piece-wise constant approximation for non-conjugate Gaussian Process models

Seitz, Sarem

Gaussian Processes (GPs) are a versatile and popular method in Bayesian Machine Learning. A common modification are Sparse Variational Gaussian Processes (SVGPs) which are well suited to deal with large datasets. While GPs allow to elegantly deal with Gaussian-distributed target variables in closed form, their applicability can be extended to non-Gaussian data as well. These extensions are usually impossible to treat in closed form and hence require approximate solutions. This paper proposes to approximate the inverse-link function, which is necessary when working with non-Gaussian likelihoods, by a piece-wise constant function. It will be shown that this yields a closed form solution for the corresponding SVGP lower bound. In addition, it is demonstrated how the piece-wise constant function itself can be optimized, resulting in an inverse-link function that can be learnt from the data at hand.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

2204.10575

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Modeling & Simulation (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)