Oceania
Project purple: IAG moves away from being an analogue business ZDNet
Before Insurance Australia Group (IAG) can begin selling more than insurance products to consumers, the company realised it needed to shift what is currently a very analogue business into something that is more digitally orientated. "We really want to change the mindset, get some records and customer value, and build new businesses as well, including beyond insurance," IAG Digital Architecture director Ian Jamieson explained at New Relic Future Stack 2019 last week. "An insurance company not selling insurance is quite disruptive because a lot of the core systems is under insurance, so if we want to sell a solution that provides emergency assistance to your home that is not an insurance product, how would you bill someone for that so that it's not under insurance … there are a range of things we need to transform and add new capabilities to." Jamieson said some areas that IAG is looking to expand its business into include motor and home repair services, spinning up brand new businesses such as in mobility services, and through acquisitions of startups, such as its most recent purchase of Carbar, a subscription-based car ownership platform. To make sure these business plans become a reality, Jamieson said the company has moved away from taking traditional waterfall approaches to projects and using a cross-functional method.
PROFET: Construction and Inference of DBNs Based on Mathematical Models
Ajmal, Hamda, Madden, Michael, Enright, Catherine
PROFET: Construction and Inference of DBNs Based on Mathematical Models Hamda Ajmal, Michael Madden and Catherine Enright School of Computer Science, National University of Ireland Galway h.ajmal1@nuigalway.ie, Abstract This paper presents, evaluates, and discusses a new software tool to automatically build Dynamic Bayesian Networks (DBNs) from ordinary differential equations (ODEs) entered by the user. The DBNs generated from ODE models can handle both data uncertainty and model uncertainty in a principled manner. The application, named PROFET, can be used for temporal data mining with noisy or missing variables. It enables automatic re-estimation of model parameters using temporal evidence in the form of data streams. For temporal inference, PROFET includes both standard fixed time step particle filtering and its extension, adaptive-time particle filtering algorithms. Adaptive-time particle filtering enables the DBN to automatically adapt its time step length to match the dynamics of the model. We demonstrate PROFET's functionality by using it to infer the model variables by estimating the model parameters of four benchmark ODE systems. From the generation of the DBN model to temporal inference, the entire process is automated and is delivered as an open-source platform-independent software application with a comprehensive user interface. PROFET is released under the Apache License 2.0. Its source code, executable and documentation are available at http:://profet.
Spatial and Colour Opponency in Anatomically Constrained Deep Networks
Harris, Ethan, Mihai, Daniela, Hare, Jonathon
Colour vision has long fascinated scientists, who have sought to understand both the physiology of the mechanics of colour vision and the psychophysics of colour perception. We consider representations of colour in anatomically constrained convolutional deep neural networks. Following ideas from neuroscience, we classify cells in early layers into groups relating to their spectral and spatial functionality. We show the emergence of single and double opponent cells in our networks and characterise how the distribution of these cells changes under the constraint of a retinal bottleneck. Our experiments not only open up a new understanding of how deep networks process spatial and colour information, but also provide new tools to help understand the black box of deep learning. The code for all experiments is avaialable at \url{https://github.com/ecs-vlc/opponency}.
Deep learning for Aerosol Forecasting
Hoyne, Caleb, Mukkavilli, S. Karthik, Meger, David
Reanalysis datasets combining numerical physics models and limited observations to generate a synthesised estimate of variables in an Earth system, are prone to biases against ground truth. Biases identified with the NASA Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) aerosol optical depth (AOD) dataset, against the Aerosol Robotic Network (AERONET) ground measurements in previous studies, motivated the development of a deep learning based AOD prediction model globally. This study combines a convolutional neural network (CNN) with MERRA-2, tested against all AERONET sites. The new hybrid CNN-based model provides better estimates validated versus AERONET ground truth, than only using MERRA-2 reanalysis.
Shapley Homology: Topological Analysis of Sample Influence for Neural Networks
Zhang, Kaixuan, Wang, Qinglong, Liu, Xue, Giles, C. Lee
Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (iid). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley Homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. By interpreting the influence as a probability measure, we further define an entropy which reflects the complexity of the data manifold. Our empirical studies show that when using the 0-dimensional homology, on neighboring graphs, samples with higher influence scores have more impact on the accuracy of neural networks for determining the graph connectivity and on several regular grammars whose higher entropy values imply more difficulty in being learned.
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Liu, Yao, Bacon, Pierre-Luc, Brunskill, Emma
Due in part to the growing sources of data about past sequences of decisions and their outcomes - from marketing to energy management to healthcare - there is increasing interest in developing accurate and efficient algorithms for off-policy policy evaluation. For Markov Decision Processes, this problem was addressed (Precup et al., 2000) early on by importance sampling (IS)(Rubinstein, 1981), a method prone to large variance due to rare events (Glynn, 1994; L'Ecuyer et al., 2009). The per-decision importance sampling estimator of Precup et al. (2000) tries to mitigate this problem by leveraging the temporal structure - earlier rewards cannot depend on later decisions - of the domain. While neither importance sampling (IS) nor per-decision IS (PDIS) assumes the underlying domain is Markov, more recently, a new class of estimators (Hallak and Mannor, 2017; Liu et al., 2018; Gelada and Bellemare, 2019) has been proposed that leverages the Markovian structure. In particular, these approaches propose performing importance sampling over the stationary state-action distributions induced by the corresponding Markov chain for a particular policy. By avoiding the explicit accumulation of likelihood ratios along the trajectories, it is hypothesized that such ratios of stationary distributions could substantially reduce the variance of the resulting estimator, thereby overcoming the "curse of horizon" (Liu et al., 2018) plaguing off-policy evaluation. The recent flurry of empirical results shows significant performance improvements over the alternative methods on a variety of simulation domains. Yet so far there has not been a formal analysis of the accuracy of IS, PDIS, and stationary state-action IS which will strengthen our understanding of their properties, benefits and limitations.
Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space
Mérigot, Quentin, Delalande, Alex, Chazal, Frédéric
This work studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density. This embedding linearizes to some extent the 2 -Wasserstein space, and enables the direct use of generic supervised and unsupervised learning algorithms on measure data. Our main result is that the embedding is (bi-)Hölder continuous, when the reference density is uniform over a convex set, and can be equivalently phrased as a dimension-independent Hölder-stability results for optimal transport maps. 1. Introduction Numerous problems involve the comparison of point clouds, i.e. sets of points that lie in a metric space and for which the spatial distribution is of interest. Seeing the point clouds as discrete probability measures in a metric space, it is natural to compare them using Wasserstein distances defined by the optimal transport theory [37]. These distances have indeed been successfully used in a variety of applications in machine learning [11, 3, 25, 23, 19, 1] and in statistics [39, 12, 8, 35]. In the discrete setting, many efficient algorithms have been proposed to compute or approximate the Wasserstein distances, such as Sinkhorn-Knopp and auction algorithms - see [34] and references therein.
Emergent properties of the local geometry of neural loss landscapes
Fort, Stanislav, Ganguli, Surya
Emergent properties of the local geometry of neural loss landscapesStanislav Fort Surya Ganguli Stanford University Stanford, CA, USA Stanford University Stanford, CA, USA Abstract The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly C directions of high positive curvature, where C is the number of classes; (2) gradient directions are largely confined to this extremely low dimensional subspace of positive Hessian curvature, leaving the vast majority of directions in weight space unexplored; (3) gradient descent transiently explores intermediate regions of higher positive curvature before eventually finding flatter minima; (4) training can be successful even when confined to low dimensional random affine hy-perplanes, as long as these hyperplanes intersect a Goldilocks zone of higher than average curvature. We develop a simple theoretical model of gradients and Hessians, justified by numerical experiments on architectures and datasets used in practice, that simultaneously accounts for all 4 of these surprising and seemingly unrelated properties. Our unified model provides conceptual insights into the emergence of these properties and makes connections with diverse topics in neural networks, random matrix theory, and spin glasses, including the neural tangent kernel, BBP phase transitions, and Derrida's random energy model. 1 Introduction The geometry of neural network loss landscapes and the implications of this geometry for both optimization and generalization have been subjects of intense interest in many works, ranging from studies on the lack of local minima at significantly higher loss than that of the global minimum [1, 2] to studies debating relations between the curvature of local minima and their generalization properties [3, 4, 5, 6]. Fundamentally, the neural network loss landscape is a scalar loss function over a very high D dimensional parameter space that could depend a priori in highly nontrivial ways on the very structure of real-world data itself as well as intricate properties of the neural network architecture. Moreover, the regions of this loss landscape explored by gradient descent could themselves have highly atypical geometric properties relative to randomly chosen points in the landscape.
Evolving Gaussian Process kernels from elementary mathematical expressions
Roman, Ibai, Santana, Roberto, Mendiburu, Alexander, Lozano, Jose A.
Choosing the most adequate kernel is crucial in many Machine Learning applications. Gaussian Process is a state-of-the-art technique for regression and classification that heavily relies on a kernel function. However, in the Gaussian Process literature, kernels have usually been either ad hoc designed, selected from a predefined set, or searched for in a space of compositions of kernels which have been defined a priori. In this paper, we propose a Genetic-Programming algorithm that represents a kernel function as a tree of elementary mathematical expressions. By means of this representation, a wider set of kernels can be modeled, where potentially better solutions can be found, although new challenges also arise. The proposed algorithm is able to overcome these difficulties and find kernels that accurately model the characteristics of the data. This method has been tested in several real-world time-series extrapolation problems, improving the state-of-the-art results while reducing the complexity of the kernels.
Code Generation as a Dual Task of Code Summarization
Wei, Bolin, Li, Ge, Xia, Xin, Fu, Zhiyi, Jin, Zhi
Code summarization (CS) and code generation (CG) are two crucial tasks in the field of automatic software development. Various neural network-based approaches are proposed to solve these two tasks separately. However, there exists a specific intuitive correlation between CS and CG, which have not been exploited in previous work. In this paper, we apply the relations between two tasks to improve the performance of both tasks. In other words, exploiting the duality between the two tasks, we propose a dual training framework to train the two tasks simultaneously. In this framework, we consider the dualities on probability and attention weights, and design corresponding regularization terms to constrain the duality. We evaluate our approach on two datasets collected from GitHub, and experimental results show that our dual framework can improve the performance of CS and CG tasks over baselines.