AITopics

2210.05484

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceJan-11-2023

Contrastive Neural Ratio Estimation

Miller, Benjamin Kurt, Weniger, Christoph, Forré, Patrick

Likelihood-to-evidence ratio estimation is usually cast as either a binary (NRE-A) or a multiclass (NRE-B) classification task. In contrast to the binary classification framework, the current formulation of the multiclass version has an intrinsic and unknown bias term, making otherwise informative diagnostics unreliable. We propose a multiclass framework free from the bias inherent to NRE-B at optimum, leaving us in the position to run diagnostics that practitioners depend on. It also recovers NRE-A in one corner case and NRE-B in the limiting case. For fair comparison, we benchmark the behavior of all algorithms in both familiar and novel training regimes: when jointly drawn data is unlimited, when data is fixed but prior draws are unlimited, and in the commonplace fixed data and parameters setting. Our investigations reveal that the highest performing models are distant from the competitors (NRE-A, NRE-B) in hyperparameter space. We make a recommendation for hyperparameters distinct from the previous models. We suggest a bound on the mutual information as a performance metric for simulation-based inference methods, without the need for posterior samples, and provide experimental results.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2210.0617

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
(2 more...)

arXiv.org Artificial IntelligenceDec-29-2022

Normalizing Flows for Hierarchical Bayesian Analysis: A Gravitational Wave Population Study

Ruhe, David, Wong, Kaze, Cranmer, Miles, Forré, Patrick

We propose parameterizing the population distribution of the gravitational wave population modeling framework (Hierarchical Bayesian Analysis) with a normalizing flow. We first demonstrate the merit of this method on illustrative experiments and then analyze four parameters of the latest LIGO/Virgo data release: primary mass, secondary mass, redshift, and effective spin. Our results show that despite the small and notoriously noisy dataset, the posterior predictive distributions (assuming a prior over the parameters of the flow) of the observed gravitational wave population recover structure that agrees with robust previous phenomenological modeling results while being less susceptible to biases introduced by less flexible models. Therefore, the method forms a promising flexible, reliable replacement for population inference distributions, even when data is highly noisy.

artificial intelligence, machine learning, population model, (14 more...)

2211.09008

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Artificial IntelligenceJul-28-2021

Self-Supervised Hybrid Inference in State-Space Models

Ruhe, David, Forré, Patrick

We perform approximate inference in state-space models that allow for nonlinear higher-order Markov chains in latent space. The conditional independencies of the generative model enable us to parameterize only an inference model, which learns to estimate clean states in a self-supervised manner using maximum likelihood. First, we propose a recurrent method that is trained directly on noisy observations. Afterward, we cast the model such that the optimization problem leads to an update scheme that backpropagates through a recursion similar to the classical Kalman filter and smoother. In scientific applications, domain knowledge can give a linear approximation of the latent transition maps. We can easily incorporate this knowledge into our model, leading to a hybrid inference approach. In contrast to other methods, experiments show that the hybrid method makes the inferred latent states physically more interpretable and accurate, especially in low-data regimes. Furthermore, we do not rely on an additional parameterization of the generative model or supervision via uncorrupted observations or ground truth latent states. Despite our model's simplicity, we obtain competitive results on the chaotic Lorenz system compared to a fully supervised approach and outperform a method based on variational inference.

artificial intelligence, inference, neural network, (19 more...)

2107.13349

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)

arXiv.org Machine LearningJun-10-2021

Coordinate Independent Convolutional Networks -- Isometry and Gauge Equivariant Convolutions on Riemannian Manifolds

Weiler, Maurice, Forré, Patrick, Verlinde, Erik, Welling, Max

Motivated by the vast success of deep convolutional networks, there is a great interest in generalizing convolutions to non-Euclidean manifolds. A major complication in comparison to flat spaces is that it is unclear in which alignment a convolution kernel should be applied on a manifold. The underlying reason for this ambiguity is that general manifolds do not come with a canonical choice of reference frames (gauge). Kernels and features therefore have to be expressed relative to arbitrary coordinates. We argue that the particular choice of coordinatization should not affect a network's inference -- it should be coordinate independent. A simultaneous demand for coordinate independence and weight sharing is shown to result in a requirement on the network to be equivariant under local gauge transformations (changes of local reference frames). The ambiguity of reference frames depends thereby on the G-structure of the manifold, such that the necessary level of gauge equivariance is prescribed by the corresponding structure group G. Coordinate independent convolutions are proven to be equivariant w.r.t. those isometries that are symmetries of the G-structure. The resulting theory is formulated in a coordinate free fashion in terms of fiber bundles. To exemplify the design of coordinate independent convolutions, we implement a convolutional network on the M\"obius strip. The generality of our differential geometric formulation of convolutional networks is demonstrated by an extensive literature review which explains a large number of Euclidean CNNs, spherical CNNs and CNNs on general surfaces as specific instances of coordinate independent convolutions.

gm -coordinate independent convolution, kernel field and stabilizer constraint, multi directional geodesic convolution, (22 more...)

2106.0602

Country:

North America > United States > Texas (0.27)
Asia > Middle East > Israel > Mediterranean Sea (0.14)
North America > United States > Gulf of Mexico > Central GOM (0.13)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Health & Medicine (0.92)
Energy > Oil & Gas > Upstream (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

arXiv.org Machine LearningApr-23-2021

Transitional Conditional Independence

Forré, Patrick

We develope the framework of transitional conditional independence. For this we introduce transition probability spaces and transitional random variables. These constructions will generalize, strengthen and unify previous notions of (conditional) random variables and non-stochastic variables, (extended) stochastic conditional independence and some form of functional conditional independence. Transitional conditional independence is asymmetric in general and it will be shown that it satisfies all desired relevance relations in terms of left and right versions of the separoid rules, except symmetry, on standard, analytic and universal measurable spaces. As a preparation we prove a disintegration theorem for transition probabilities, i.e. the existence and essential uniqueness of (regular) conditional Markov kernels, on those spaces. Transitional conditional independence will be able to express classical statistical concepts like sufficiency, adequacy and ancillarity. As an application, we will then show how transitional conditional independence can be used to prove a directed global Markov property for causal graphical models that allow for non-stochastic input variables in strong generality. This will then also allow us to show the main rules of causal do-calculus, relating observational and interventional distributions, in such measure theoretic generality.

artificial intelligence, markov kernel, measurable space, (13 more...)

2104.11547

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)

arXiv.org Artificial IntelligenceMar-8-2021

Efficient Causal Inference from Combined Observational and Interventional Data through Causal Reductions

Ilse, Maximilian, Forré, Patrick, Welling, Max, Mooij, Joris M.

Unobserved confounding is one of the main challenges when estimating causal effects. We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders with a single latent confounder that lives in the same space as the treatment variable without changing the observational and interventional distributions entailed by the causal model. After the reduction, we parameterize the reduced causal model using a flexible class of transformations, so-called normalizing flows. We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data. This allows us to estimate the causal effect in a principled way from combined data. We perform a series of experiments on data simulated using nonlinear causal mechanisms and find that we can often substantially reduce the number of interventional samples when adding observational training samples without sacrificing accuracy. Thus, adding observational data may help to more accurately estimate causal effects even in the presence of unobserved confounders.

bayesian inference, interventional sample, neural network, (16 more...)

2103.04786

Country: Europe > Netherlands (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

arXiv.org Machine LearningFeb-10-2021

Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models

Hoogeboom, Emiel, Nielsen, Didrik, Jaini, Priyank, Forré, Patrick, Welling, Max

The field of language modelling has been largely dominated by autoregressive models, for which sampling is inherently difficult to parallelize. This paper introduces two new classes of generative models for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our models perform competitively on language modelling and modelling of image segmentation maps.

argmax flow, deep learning, neural network, (17 more...)

2102.05379

Country:

North America > Mexico (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningNov-14-2020

Self Normalizing Flows

Keller, T. Anderson, Peters, Jorn W. T., Jaini, Priyank, Hoogeboom, Emiel, Forré, Patrick, Welling, Max

Efficient gradient computation of the Jacobian determinant term is a core problem of the normalizing flow framework. Thus, most proposed flow models either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while surpassing the performance of their functionally constrained counterparts.

deep learning, gradient, neural network, (18 more...)

2011.07248

Country:

Europe (0.68)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningSep-5-2020

FlipOut: Uncovering Redundant Weights via Sign Flipping

Apostol, Andrei, Stol, Maarten, Forré, Patrick

Modern neural networks, although achieving state-of-the-art results on many tasks, tend to have a large number of parameters, which increases training time and resource usage. This problem can be alleviated by pruning. Existing methods, however, often require extensive parameter tuning or multiple cycles of pruning and retraining to convergence in order to obtain a favorable accuracy-sparsity trade-off. To address these issues, we propose a novel pruning method which uses the oscillations around $0$ (i.e. sign flips) that a weight has undergone during training in order to determine its saliency. Our method can perform pruning before the network has converged, requires little tuning effort due to having good default values for its hyperparameters, and can directly target the level of sparsity desired by the user. Our experiments, performed on a variety of object classification architectures, show that it is competitive with existing methods and achieves state-of-the-art performance for levels of sparsity of $99.6\%$ and above for most of the architectures tested. For reproducibility, we release our code publicly at https://github.com/AndreiXYZ/flipout.

deep learning, flipout, neural network, (18 more...)

2009.02594

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)