AITopics

1901.00397

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

#artificialintelligenceJan-28-2019, 16:41:26 GMT

Positions for Exceptional Doctoral Students (deadline January 31, 2019)

The Helsinki Doctoral Education Network in Information and Communications Technology (HICT) is a joint initiative by Aalto University and the University of Helsinki, the two leading universities within this area in Finland. The network involves at present over 60 professors and over 200 doctoral students, and the participating units graduate altogether more than 40 new doctors each year. The quality of research and education in both HICT universities is world-class, and the education is practically free as there are no tuition fees for doctoral students in the Finnish university system. In terms of the living environment, Helsinki has been ranked as one of the world's top-10 most livable cities (Economist, 2017), and Finland is among the best countries in the world with respect to many quality of life indicators, including being the overall #1 country in human wellbeing. Helsinki is in the second place in the world's startup city comparison (Valuer, 2018) and is also the Mobile Data Capital of the World (IEEE Spectrum, 2018). The participating units of HICT have currently funding available for exceptionally qualified doctoral students. We offer the possibility to join world-class research groups, with multiple interesting research projects to choose from. If you wish to be considered as a potential new doctoral student in HICT you can apply to one or a number of doctoral student positions (listed below).

artificial intelligence, data mining, machine learning, (17 more...)

#artificialintelligence

Country: Europe > Finland > Uusimaa > Helsinki (0.89)

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report (0.68)

Industry:

Information Technology (0.94)
Health & Medicine > Therapeutic Area (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Human Computer Interaction (0.94)
Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
(2 more...)

Trillos, Nicolas Garcia, Kaplan, Zach, Sanz-Alonso, Daniel

Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning

The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of a best Gaussian approximation in Kullback-Leibler divergence. Under this unified light, the optimization schemes for local entropy and heat regularized loss differ only over which argument of the Kullback-Leibler divergence is used to find the best Gaussian approximation. Local entropy corresponds to minimizing over the second argument, and the solution is given by moment matching. This allows to replace traditional back-propagation calculation of gradients by sampling algorithms, opening an avenue for gradient-free, parallelizable training of neural networks.

algorithm, local entropy and heat regularization, regularization, (9 more...)

1901.10082

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Watson, David S., Wright, Marvin N.

Testing Conditional Predictive Independence in Supervised Learning Algorithms

We propose a general test of conditional independence. The conditional predictive impact (CPI) is a provably consistent and unbiased estimator of one or several features' association with a given outcome, conditional on a (potentially empty) reduced feature set. The measure can be calculated using any supervised learning algorithm and loss function. It relies on no parametric assumptions and applies equally well to continuous and categorical predictors and outcomes. The CPI can be efficiently computed for low- or high-dimensional data without any sparsity constraints. We illustrate PAC-Bayesian convergence rates for the CPI and develop statistical inference procedures for evaluating its magnitude, significance, and precision. These tests aid in feature and model selection, extending traditional frequentist and Bayesian techniques to general supervised learning tasks. The CPI may also be used in conjunction with causal discovery algorithms to identify underlying graph structures for multivariate systems. We test our method in conjunction with various algorithms, including linear regression, neural networks, random forests, and support vector machines. Empirical results show that the CPI compares favorably to alternative variable importance measures and other nonparametric tests of conditional independence on a diverse array of real and simulated datasets. Simulations confirm that our inference procedures successfully control Type I error and achieve nominal coverage probability. Our method has been implemented in an R package, cpi, which can be downloaded from https://github.com/dswatson/cpi.

effect size 0, hypothesis, random forest neural network 0, (10 more...)

1901.09917

Country:

Europe > Austria > Vienna (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

Triastcyn, Aleksei, Faltings, Boi

Improved Accounting for Differentially Private Learning

We consider the problem of differential privacy accounting, i.e. estimation of privacy loss bounds, in machine learning in a broad sense. We propose two versions of a generic privacy accountant suitable for a wide range of learning algorithms. Both versions are derived in a simple and principled way using well-known tools from probability theory, such as concentration inequalities. We demonstrate that our privacy accountant is able to achieve state-of-the-art estimates of DP guarantees and can be applied to new areas like variational inference. Moreover, we show that the latter enjoys differential privacy at minor cost.

accountant, improved accounting, privacy, (14 more...)

1901.09697

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Oceania > Australia > Tasmania (0.04)
Europe > Italy > Veneto > Venice (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Tsuzuku, Yusuke, Sato, Issei, Sugiyama, Masashi

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

The notion of flat minima has played a key role in the generalization studies of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the previous definitions of the flatness might not be a good measure of generalization, because generalization is invariant to such rescalings. In this paper, from the PAC-Bayesian perspective, we scrutinize the discussion concerning the flat minima and introduce the notion of normalized flat minima, which is free from the known scale dependence issues. Additionally, we highlight the scale dependence of existing matrix-norm based generalization error bounds similar to the existing flat minima definitions. Our modified notion of the flatness does not suffer from the insufficiency, either, suggesting it might provide better hierarchy in the hypothesis class.

flat minima, normalized sharpness, sharpness, (13 more...)

1901.04653

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningJan-27-2019

The CM Algorithm for the Maximum Mutual Information Classifications of Unseen Instances

Lu, Chenguang

The Maximum Mutual Information (MMI) criterion is different from the Least Error Rate (LER) criterion. It can reduce failing to report small probability events. This paper introduces the Channels Matching (CM) algorithm for the MMI classifications of unseen instances. It also introduces some semantic information methods, which base the CM algorithm. In the CM algorithm, label learning is to let the semantic channel match the Shannon channel (Matching I) whereas classifying is to let the Shannon channel match the semantic channel (Matching II). We can achieve the MMI classifications by repeating Matching I and II. For low-dimensional feature spaces, we only use parameters to construct n likelihood functions for n different classes (rather than to construct partitioning boundaries as gradient descent) and expresses the boundaries by numerical values. Without searching in parameter spaces, the computation of the CM algorithm for low-dimensional feature spaces is very simple and fast. Using a two-dimensional example, we test the speed and reliability of the CM algorithm by different initial partitions. For most initial partitions, two iterations can make the mutual information surpass 99% of the convergent MMI. The analysis indicates that for high-dimensional feature spaces, we may combine the CM algorithm with neural networks to improve the MMI classifications for faster and more reliable convergence.

algorithm, classification, criterion, (15 more...)

1901.09902

Country:

North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Dikov, Georgi, van der Smagt, Patrick, Bayer, Justin

Bayesian Learning of Neural Network Architectures

arXiv.org Machine LearningJan-27-2019

In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.

architecture, layer size, neural network, (13 more...)

1901.04436

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Sadeghi, Kayvan, Rinaldo, Alessandro

Markov Properties of Discrete Determinantal Point Processes

arXiv.org Machine LearningJan-27-2019

Determinantal point processes (DPPs) are probabilistic models for repulsion. When used to represent the occurrence of random subsets of a finite base set, DPPs allow to model global negative associations in a mathematically elegant and direct way. Discrete DPPs have become popular and computationally tractable models for solving several machine learning tasks that require the selection of diverse objects, and have been successfully applied in numerous real-life problems. Despite their popularity, the statistical properties of such models have not been adequately explored. In this note, we derive the Markov properties of discrete DPPs and show how they can be expressed using graphical models.

dpp, independence model, markov property, (13 more...)

1810.02294

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningJan-25-2019

Bayesian surrogate learning in dynamic simulator-based regression problems

Chen, Xi, Hobson, Mike

The estimation of unknown values of parameters (or hidden variables, control variables) that characterise a physical system often relies on the comparison of measured data with synthetic data produced by some numerical simulator of the system as the parameter values are varied. This process often encounters two major difficulties: the generation of synthetic data for each considered set of parameter values can be computationally expensive if the system model is complicated; and the exploration of the parameter space can be inefficient and/or incomplete, a typical example being when the exploration becomes trapped in a local optimum of the objection function that characterises the mismatch between the measured and synthetic data. A method to address both these issues is presented, whereby: a surrogate model (or proxy), which emulates the computationally expensive system simulator, is constructed using deep recurrent networks (DRN); and a nested sampling (NS) algorithm is employed to perform efficient and robust exploration of the parameter space. The analysis is performed in a Bayesian context, in which the samples characterise the full joint posterior distribution of the parameters, from which parameter estimates and uncertainties are easily derived. The proposed approach is compared with conventional methods in some numerical examples, for which the results demonstrate that one can accelerate the parameter estimation process by at least an order of magnitude.

deep learning, simulator, upstream oil & gas, (18 more...)

1901.08898

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)