AITopics

We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical justification for the validity of these bounds. Furthermore, its asymmetric uncertainty scores offer additional, directional information beyond what Gaussian-based, symmetric variance estimation can provide. We find that noise in the data is adequately captured by the intervals produced with our method. In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error in comparison to two state-of-the-art uncertainty quantification methods. In summary, INNs produce fast, theoretically justified uncertainty scores for DNNs that are easy to interpret, come with added information and pose as improved error proxies - features that may prove useful in advancing the usability of DNNs especially in sensitive applications such as health care.

interval neural network, neural network, uncertainty score, (14 more...)

2003.11566

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Berlin (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Senhaji, Ali, Raitoharju, Jenni, Gabbouj, Moncef, Iosifidis, Alexandros

Not all domains are equally complex: Adaptive Multi-Domain Learning

Deep learning approaches are highly specialized and require training separate models for different tasks. Multi-domain learning looks at ways to learn a multitude of different tasks, each coming from a different domain, at once. The most common approach in multi-domain learning is to form a domain agnostic model, the parameters of which are shared among all domains, and learn a small number of extra domain-specific parameters for each individual new domain. However, different domains come with different levels of difficulty; parameterizing the models of all domains using an augmented version of the domain agnostic model leads to unnecessarily inefficient solutions, especially for easy to solve tasks. We propose an adaptive parameterization approach to deep neural networks for multi-domain learning. The proposed approach performs on par with the original approach while reducing by far the number of parameters, leading to efficient multi-domain learning solutions.

adapter, architecture, exit module, (16 more...)

2003.11504

Country:

Europe > Finland > Pirkanmaa > Tampere (0.04)
Europe > Denmark (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tang, Shuai, Maddox, Wesley J., Dickens, Charlie, Diethe, Tom, Damianou, Andreas

Similarity of Neural Networks with Gradients

A suitable similarity index for comparing learnt neural networks plays an important role in understanding the behaviour of the highly-nonlinear functions, and can provide insights on further theoretical analysis and empirical studies. We define two key steps when comparing models: firstly, the representation abstracted from the learnt model, where we propose to leverage both feature vectors and gradient ones (which are largely ignored in prior work) into designing the representation of a neural network. Secondly, we define the employed similarity index which gives desired invariance properties, and we facilitate the chosen ones with sketching techniques for comparing various datasets efficiently. Empirically, we show that the proposed approach provides a state-of-the-art method for computing similarity of neural networks that are trained independently on different datasets and the tasks defined by the datasets.

dataset, neural network, representation, (13 more...)

2003.11498

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Scalable Variational Gaussian Process Regression Networks

Li, Shibo, Xing, Wei, Kirby, Mike, Zhe, Shandian

Gaussian process regression networks (GPRN) are powerful Bayesian models for multi-output regression, but their inference is intractable. To address this issue, existing methods use a fully factorized structure (or a mixture of such structures) over all the outputs and latent functions for posterior approximation, which, however, can miss the strong posterior dependencies among the latent variables and hurt the inference quality. In addition, the updates of the variational parameters are inefficient and can be prohibitively expensive for a large number of outputs. To overcome these limitations, we propose a scalable variational inference algorithm for GPRN, which not only captures the abundant posterior dependencies but also is much more efficient for massive outputs. We tensorize the output space and introduce tensor/matrix-normal variational posteriors to capture the posterior correlations and to reduce the parameters. We jointly optimize all the parameters and exploit the inherent Kronecker product structure in the variational model evidence lower bound to accelerate the computation. We demonstrate the advantages of our method in several real-world applications.

latent function, posterior, variational posterior, (14 more...)

2003.11489

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.34)

Siivola, Eero, Dhaka, Akash Kumar, Andersen, Michael Riis, Gonzalez, Javier, Moreno, Pablo Garcia, Vehtari, Aki

Preferential Batch Bayesian Optimization

Most research in Bayesian optimization (BO) has focused on direct feedback scenarios, where one has access to exact, or perturbed, values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyper-parameter configuration problems. However, in domains such as modelling human preferences, A/B tests or recommender systems, there is a need of methods that are able to replace direct feedback with preferential feedback, obtained via rankings or pairwise comparisons. In this work, we present Preferential Batch Bayesian Optimization (PBBO), a new framework that allows to find the optimum of a latent function of interest, given any type of parallel preferential feedback for a group of two or more points. We do so by using a Gaussian process model with a likelihood specially designed to enable parallel and efficient data collection mechanisms, which are key in modern machine learning. We show how the acquisitions developed under this framework generalize and augment previous approaches in Bayesian optimization, expanding the use of these techniques to a wider range of domains. An extensive simulation study shows the benefits of this approach, both with simulated functions and four real data sets.

batch, observation num, optimization, (11 more...)

2003.11435

Country:

North America > United States (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Sande, Luis Sanguiao, Zhang, Li-Chun

Design-unbiased statistical learning in survey sampling

Approximately design-unbiased model-assisted estimation is not new. It has become the standard practice in survey sampling, following many influential works such as Särndal et al. (1992), Deville and Särndal (1992). However, there lacks so far a theory, which allows one to generally incorporate the many common machine-learning (ML) techniques. For instance, according to Breit and Opsomer(2017, p. 203), they"are not aware of direct uses of random forests in a model-assisted survey estimator". Since modern ML techniques can often generate more flexible and powerful prediction models, when rich auxiliary feature data are available, the potentials are worth exploring, in any situation where the practical advantages of linear weighting are not essential compared to the efficiency gains that can be achieved by alternative nonlinear ML techniques. We propose a subsampling Rao-Blackwell(SRB) method, which enables exactly designunbiased estimation with the help of linear or nonlinear prediction models. Monte Carlo (MC) versions of the proposed method can be used in cases where exact RB method is computationally too costly.

estimation, estimator, variance, (16 more...)

2003.11423

Country:

North America > United States > New York (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Nazarov, Ivan, Burnaev, Evgeny

Bayesian Sparsification Methods for Deep Complex-valued Networks

Deep neural networks are an integral part of machine learning and data science toolset for practical data-driven problem solving. With continual miniaturization ever more applications can be found in embedded systems. Common embedded applications include on-device image recognition and signal processing. Despite recent advances in generalization and optimization theory specific to deep networks, deploying in actual embedded hardware remains a challenge due to storage, real-time throughput, and arithmetic complexity restrictions [He et al., 2018]. Therefore, compression methods for achieving high model sparsity and numerical efficiency without losing much in performance are especially relevant.

arxiv, tradeoff, twolayerdensemodel 0, (13 more...)

2003.11413

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Gupta, Abhishek, Haskell, William B.

Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence

This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs) in machine learning and reinforcement learning. RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy certain contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. Inspired by the SVRG algorithm, we develop a method to convert any RSA to a variance reduced RSA that converges to the optimal solution with in almost sure sense or in probability. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework. We apply this framework to ascertain the convergence of mini-batch SGD, forward-backward splitting with catalyst, SVRG, SAGA, empirical Q value iteration, synchronous Q-learning, enhanced policy iteration, and MDPs with a generative model. We also develop two new algorithms for reinforcement learning and establish their convergence using this framework.

algorithm, divergence, iteration, (15 more...)

2003.11403

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Makowski, Silvia, Jäger, Lena A., Schwetlick, Lisa, Trukenbrod, Hans, Engbert, Ralf, Scheffer, Tobias

Discriminative Viewer Identification using Generative Models of Eye Gaze

We study the problem of identifying viewers of arbitrary images based on their eye gaze. Psychological research has derived generative stochastic models of eye movements. In order to exploit this background knowledge within a discriminatively trained classification model, we derive Fisher kernels from different generative models of eye gaze. Experimentally, we find that the performance of the classifier strongly depends on the underlying generative model. Using an SVM with Fisher kernel improves the classification performance over the underlying generative model.

generative model, likelihood, partial derivative, (12 more...)

2003.11399

Country:

Europe > Germany > Brandenburg > Potsdam (0.05)
Europe > Latvia > Riga Municipality > Riga (0.04)
Europe > Poland (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Khuat, Thanh Tung, Gabrys, Bogdan

Accelerated learning algorithms of general fuzzy min-max neural network using a branch-and-bound-based hyperbox selection rule

This paper proposes a method to accelerate the training process of general fuzzy min-max neural network. The purpose is to reduce the unsuitable hyperboxes selected as the potential candidates of the expansion step of existing hyperboxes to cover a new input pattern in the online learning algorithms or candidates of the hyperbox aggregation process in the agglomerative learning algorithms. Our proposed approach is based on the mathematical formulas to form a branch-and-bound solution aiming to remove the hyperboxes which are certain not to satisfy expansion or aggregation conditions, and in turn decreasing the training time of learning algorithms. The efficiency of the proposed method is assessed over a number of widely used data sets. The experimental results indicated the significant decrease in training time of proposed approach for both online and agglomerative learning algorithms. Notably, the training time of the online learning algorithms is reduced from 1.2 to 12 times when using the proposed method, while the agglomerative learning algorithms are accelerated from 7 to 37 times on average.

algorithm, hyperbox candidate, hyperboxe, (13 more...)

2003.11333

Country:

North America > United States > Wisconsin (0.05)
Europe > Portugal > Coimbra > Coimbra (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.81)

Industry:

Education (0.56)
Health & Medicine > Therapeutic Area (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)