AITopics

Ajmal, Hamda, Madden, Michael, Enright, Catherine

PROFET: Construction and Inference of DBNs Based on Mathematical Models

PROFET: Construction and Inference of DBNs Based on Mathematical Models Hamda Ajmal, Michael Madden and Catherine Enright School of Computer Science, National University of Ireland Galway h.ajmal1@nuigalway.ie, Abstract This paper presents, evaluates, and discusses a new software tool to automatically build Dynamic Bayesian Networks (DBNs) from ordinary differential equations (ODEs) entered by the user. The DBNs generated from ODE models can handle both data uncertainty and model uncertainty in a principled manner. The application, named PROFET, can be used for temporal data mining with noisy or missing variables. It enables automatic re-estimation of model parameters using temporal evidence in the form of data streams. For temporal inference, PROFET includes both standard fixed time step particle filtering and its extension, adaptive-time particle filtering algorithms. Adaptive-time particle filtering enables the DBN to automatically adapt its time step length to match the dynamics of the model. We demonstrate PROFET's functionality by using it to infer the model variables by estimating the model parameters of four benchmark ODE systems. From the generation of the DBN model to temporal inference, the entire process is automated and is delivered as an open-source platform-independent software application with a comprehensive user interface. PROFET is released under the Apache License 2.0. Its source code, executable and documentation are available at http:://profet.

inference, model parameter, ode model, (16 more...)

1910.04895

Country:

Europe > Ireland (0.24)
Europe > Czechia > Prague (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Harris, Ethan, Mihai, Daniela, Hare, Jonathon

Spatial and Colour Opponency in Anatomically Constrained Deep Networks

Colour vision has long fascinated scientists, who have sought to understand both the physiology of the mechanics of colour vision and the psychophysics of colour perception. We consider representations of colour in anatomically constrained convolutional deep neural networks. Following ideas from neuroscience, we classify cells in early layers into groups relating to their spectral and spatial functionality. We show the emergence of single and double opponent cells in our networks and characterise how the distribution of these cells changes under the constraint of a retinal bottleneck. Our experiments not only open up a new understanding of how deep networks process spatial and colour information, but also provide new tools to help understand the black box of deep learning. The code for all experiments is avaialable at \url{https://github.com/ecs-vlc/opponency}.

bottleneck, opponency, opponent cell, (15 more...)

1910.11086

Country:

Oceania > Australia (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Hoyne, Caleb, Mukkavilli, S. Karthik, Meger, David

Deep learning for Aerosol Forecasting

Reanalysis datasets combining numerical physics models and limited observations to generate a synthesised estimate of variables in an Earth system, are prone to biases against ground truth. Biases identified with the NASA Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) aerosol optical depth (AOD) dataset, against the Aerosol Robotic Network (AERONET) ground measurements in previous studies, motivated the development of a deep learning based AOD prediction model globally. This study combines a convolutional neural network (CNN) with MERRA-2, tested against all AERONET sites. The new hybrid CNN-based model provides better estimates validated versus AERONET ground truth, than only using MERRA-2 reanalysis.

aod, extreme event, indonesia, (16 more...)

1910.06789

Country:

Asia > Southeast Asia (0.14)
Asia > Indonesia > Sumatra > Jambi > Jambi (0.05)
North America > Canada > Quebec > Montreal (0.05)
(10 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Shapley Homology: Topological Analysis of Sample Influence for Neural Networks

Zhang, Kaixuan, Wang, Qinglong, Liu, Xue, Giles, C. Lee

Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (iid). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley Homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. By interpreting the influence as a probability measure, we further define an entropy which reflects the complexity of the data manifold. Our empirical studies show that when using the 0-dimensional homology, on neighboring graphs, samples with higher influence scores have more impact on the accuracy of neural networks for determining the graph connectivity and on several regular grammars whose higher entropy values imply more difficulty in being learned.

grammar, graph, topological feature, (16 more...)

1910.06509

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Liu, Yao, Bacon, Pierre-Luc, Brunskill, Emma

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

Due in part to the growing sources of data about past sequences of decisions and their outcomes - from marketing to energy management to healthcare - there is increasing interest in developing accurate and efficient algorithms for off-policy policy evaluation. For Markov Decision Processes, this problem was addressed (Precup et al., 2000) early on by importance sampling (IS)(Rubinstein, 1981), a method prone to large variance due to rare events (Glynn, 1994; L'Ecuyer et al., 2009). The per-decision importance sampling estimator of Precup et al. (2000) tries to mitigate this problem by leveraging the temporal structure - earlier rewards cannot depend on later decisions - of the domain. While neither importance sampling (IS) nor per-decision IS (PDIS) assumes the underlying domain is Markov, more recently, a new class of estimators (Hallak and Mannor, 2017; Liu et al., 2018; Gelada and Bellemare, 2019) has been proposed that leverages the Markovian structure. In particular, these approaches propose performing importance sampling over the stationary state-action distributions induced by the corresponding Markov chain for a particular policy. By avoiding the explicit accumulation of likelihood ratios along the trajectories, it is hypothesized that such ratios of stationary distributions could substantially reduce the variance of the resulting estimator, thereby overcoming the "curse of horizon" (Liu et al., 2018) plaguing off-policy evaluation. The recent flurry of empirical results shows significant performance improvements over the alternative methods on a variety of simulation domains. Yet so far there has not been a formal analysis of the accuracy of IS, PDIS, and stationary state-action IS which will strengthen our understanding of their properties, benefits and limitations.

estimator, stationary importance, variance, (13 more...)

1910.06508

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Mérigot, Quentin, Delalande, Alex, Chazal, Frédéric

Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space

This work studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density. This embedding linearizes to some extent the 2 -Wasserstein space, and enables the direct use of generic supervised and unsupervised learning algorithms on measure data. Our main result is that the embedding is (bi-)Hölder continuous, when the reference density is uniform over a convex set, and can be equivalently phrased as a dimension-independent Hölder-stability results for optimal transport maps. 1. Introduction Numerous problems involve the comparison of point clouds, i.e. sets of points that lie in a metric space and for which the spatial distribution is of interest. Seeing the point clouds as discrete probability measures in a metric space, it is natural to compare them using Wasserstein distances defined by the optimal transport theory [37]. These distances have indeed been successfully used in a variety of applications in machine learning [11, 3, 25, 23, 19, 1] and in statistics [39, 12, 8, 35]. In the discrete setting, many efficient algorithms have been proposed to compute or approximate the Wasserstein distances, such as Sinkhorn-Knopp and auction algorithms - see [34] and references therein.

optimal transport, quantitative stability, transport map, (12 more...)

1910.05954

Country:

Europe > France > Île-de-France (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Spain > Canary Islands (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.34)

Fort, Stanislav, Ganguli, Surya

Emergent properties of the local geometry of neural loss landscapes

Emergent properties of the local geometry of neural loss landscapesStanislav Fort Surya Ganguli Stanford University Stanford, CA, USA Stanford University Stanford, CA, USA Abstract The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly C directions of high positive curvature, where C is the number of classes; (2) gradient directions are largely confined to this extremely low dimensional subspace of positive Hessian curvature, leaving the vast majority of directions in weight space unexplored; (3) gradient descent transiently explores intermediate regions of higher positive curvature before eventually finding flatter minima; (4) training can be successful even when confined to low dimensional random affine hy-perplanes, as long as these hyperplanes intersect a Goldilocks zone of higher than average curvature. We develop a simple theoretical model of gradients and Hessians, justified by numerical experiments on architectures and datasets used in practice, that simultaneously accounts for all 4 of these surprising and seemingly unrelated properties. Our unified model provides conceptual insights into the emergence of these properties and makes connections with diverse topics in neural networks, random matrix theory, and spin glasses, including the neural tangent kernel, BBP phase transitions, and Derrida's random energy model. 1 Introduction The geometry of neural network loss landscapes and the implications of this geometry for both optimization and generalization have been subjects of intense interest in many works, ranging from studies on the lack of local minima at significantly higher loss than that of the global minimum [1, 2] to studies debating relations between the curvature of local minima and their generalization properties [3, 4, 5, 6]. Fundamentally, the neural network loss landscape is a scalar loss function over a very high D dimensional parameter space that could depend a priori in highly nontrivial ways on the very structure of real-world data itself as well as intricate properties of the neural network architecture. Moreover, the regions of this loss landscape explored by gradient descent could themselves have highly atypical geometric properties relative to randomly chosen points in the landscape.

gradient, loss landscape, training time, (15 more...)

1910.05929

Country:

North America > United States > California > Santa Clara County > Stanford (0.44)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Roman, Ibai, Santana, Roberto, Mendiburu, Alexander, Lozano, Jose A.

Evolving Gaussian Process kernels from elementary mathematical expressions

Choosing the most adequate kernel is crucial in many Machine Learning applications. Gaussian Process is a state-of-the-art technique for regression and classification that heavily relies on a kernel function. However, in the Gaussian Process literature, kernels have usually been either ad hoc designed, selected from a predefined set, or searched for in a space of compositions of kernels which have been defined a priori. In this paper, we propose a Genetic-Programming algorithm that represents a kernel function as a tree of elementary mathematical expressions. By means of this representation, a wider set of kernels can be modeled, where potentially better solutions can be found, although new challenges also arise. The proposed algorithm is able to overcome these difficulties and find kernels that accurately model the characteristics of the data. This method has been tested in several real-world time-series extrapolation problems, improving the state-of-the-art results while reducing the complexity of the kernels.

algorithm, hyperparameter, kernel, (16 more...)

1910.05173

Country:

North America > Canada > Quebec (0.05)
Oceania > Australia (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(3 more...)

arXiv.org Artificial IntelligenceOct-14-2019

Code Generation as a Dual Task of Code Summarization

Wei, Bolin, Li, Ge, Xia, Xin, Fu, Zhiyi, Jin, Zhi

Code summarization (CS) and code generation (CG) are two crucial tasks in the field of automatic software development. Various neural network-based approaches are proposed to solve these two tasks separately. However, there exists a specific intuitive correlation between CS and CG, which have not been exploited in previous work. In this paper, we apply the relations between two tasks to improve the performance of both tasks. In other words, exploiting the duality between the two tasks, we propose a dual training framework to train the two tasks simultaneously. In this framework, we consider the dualities on probability and attention weights, and design corresponding regularization terms to constrain the duality. We evaluate our approach on two datasets collected from GitHub, and experimental results show that our dual framework can improve the performance of CS and CG tasks over baselines.

dataset, regularization term, source code, (16 more...)

arXiv.org Artificial Intelligence

1910.05923

Country:

Asia > China (0.04)
Oceania > Australia (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.85)