Country
Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs
de Haan, Pim, Weiler, Maurice, Cohen, Taco, Welling, Max
A common approach to define convolutions on meshes is to interpret them as a graph and apply graph convolutional networks (GCNs). Such GCNs utilize isotropic kernels and are therefore insensitive to the relative orientation of vertices and thus to the geometry of the mesh as a whole. We propose Gauge Equivariant Mesh CNNs which generalize GCNs to apply anisotropic gauge equivariant kernels. Since the resulting features carry orientation information, we introduce a geometric message passing scheme defined by parallel transporting features over mesh edges. Our experiments validate the significantly improved expressivity of the proposed model over conventional GCNs and other methods.
FuDGE: Functional Differential Graph Estimation with fully and discretely observed curves
Zhao, Boxin, Wang, Y. Samuel, Kolar, Mladen
We consider the problem of estimating the difference between two functional undirected graphical models with shared structures. In many applications, data are naturally regarded as high-dimensional random function vectors rather than multivariate scalars. For example, electroencephalography (EEG) data are more appropriately treated as functions of time. In these problems, not only can the number of functions measured per sample be large, but each function is itself an infinite dimensional object, making estimation of model parameters challenging. In practice, curves are usually discretely observed, which makes graph structure recovery even more challenging. We formally characterize when two functional graphical models are comparable and propose a method that directly estimates the functional differential graph, which we term FuDGE. FuDGE avoids separate estimation of each graph, which allows for estimation in problems where individual graphs are dense, but their difference is sparse. We show that FuDGE consistently estimates the functional differential graph in a high-dimensional setting for both discretely observed and fully observed function paths. We illustrate finite sample properties of our method through simulation studies. In order to demonstrate the benefits of our method, we propose Joint Functional Graphical Lasso as a competitor, which is a generalization of the Joint Graphical Lasso. Finally, we apply our method to EEG data to uncover differences in functional brain connectivity between alcoholics and control subjects.
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
Zhou, Wei, Li, Yiying, Yang, Yongxin, Wang, Huaimin, Hospedales, Timothy M.
Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to the vanilla critic, the meta-critic network is explicitly trained to accelerate the learning process; and compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic framework is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning leads to improvements in avariety of continuous control environments when combined with contemporary Off-PAC methods DDPG, TD3 and the state-of-the-art SAC.
Meta-learning curiosity algorithms
Alet, Ferran, Schneider, Martin F., Lozano-Perez, Tomas, Kaelbling, Leslie Pack
We hypothesize that curiosity is a mechanism found by evolution that encourages meaningful exploration early in an agent's life in order to expose it to experiences that enable it to obtain high rewards over the course of its lifetime. We formulate the problem of generating curious behavior as one of meta-learning: an outer loop will search over a space of curiosity mechanisms that dynamically adapt the agent's reward signal, and an inner loop will perform standard reinforcement learning using the adapted reward signal. However, current meta-RL methods based on transferring neural network weights have only generalized between very similar tasks. To broaden the generalization, we instead propose to meta-learn algorithms: pieces of code similar to those designed by humans in ML papers. Our rich language of programs combines neural networks with other building blocks such as buffers, nearest-neighbor modules and custom loss functions. We demonstrate the effectiveness of the approach empirically, finding two novel curiosity algorithms that perform on par or better than human-designed published curiosity algorithms in domains as disparate as grid navigation with image inputs, acrobot, lunar lander, ant and hopper.
Gaussian Graphical Model exploration and selection in high dimension low sample size setting
Lartigue, Thomas, Bottani, Simona, Baron, Stephanie, Colliot, Olivier, Durrleman, Stanley, Allassonniรจre, Stรฉphanie
Gaussian Graphical Models (GGM) are often used to describe the conditional correlations between the components of a random vector. In this article, we compare two families of GGM inference methods: nodewise edge selection and penalised likelihood maximisation. We demonstrate on synthetic data that, when the sample size is small, the two methods produce graphs with either too few or too many edges when compared to the real one. As a result, we propose a composite procedure that explores a family of graphs with an nodewise numerical scheme and selects a candidate among them with an overall likelihood criterion. We demonstrate that, when the number of observations is small, this selection method yields graphs closer to the truth and corresponding to distributions with better KL divergence with regards to the real distribution than the other two. Finally, we show the interest of our algorithm on two concrete cases: first on brain imaging data, then on biological nephrology data. In both cases our results are more in line with current knowledge in each field.
Kernel Quantization for Efficient Network Compression
Yu, Zhongzhi, Shi, Yemin, Huang, Tiejun, Yu, Yizhou
This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Unlike existing methods struggling with weight bit-length, KQ has the potential in improving the compression ratio by considering the convolution kernel as the quantization unit. Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level. Instead of representing each weight parameter with a low-bit index, we learn a kernel codebook and replace all kernels in the convolution layer with corresponding low-bit indexes. Thus, KQ can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio. Then, we conduct a 6-bit parameter quantization on the kernel codebook to further reduce redundancy. Extensive experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer and achieves the state-of-the-art compression ratio with little accuracy loss.
Trusted Confidence Bounds for Learning Enabled Cyber-Physical Systems
Boursinos, Dimitrios, Koutsoukos, Xenofon
Cyber-physical systems (CPS) can benefit by the use of learning enabled components (LECs) such as deep neural networks (DNNs) for perception and desicion making tasks. However, DNNs are typically non-transparent making reasoning about their predictions very difficult, and hence their application to safety-critical systems is very challenging. LECs could be integrated easier into CPS if their predictions could be complemented with a confidence measure that quantifies how much we trust their output. The paper presents an approach for computing confidence bounds based on Inductive Conformal Prediction (ICP). We train a Triplet Network architecture to learn representations of the input data that can be used to estimate the similarity between test examples and examples in the training data set. Then, these representations are used to estimate the confidence of a set predictor based on the DNN architecture used in the triplet based on ICP. The approach is evaluated using a robotic navigation benchmark and the results show that we can computed trusted confidence bounds efficiently in real-time.
Estimation of Accurate and Calibrated Uncertainties in Deterministic models
Camporeale, Enrico, Carรจ, Algo
In this paper we focus on the problem of assigning uncertainties to single-point predictions generated by a deterministic model that outputs a continuous variable. This problem applies to any state-of-the-art physics or engineering models that have a computational cost that does not readily allow to run ensembles and to estimate the uncertainty associated to single-point predictions. Essentially, we devise a method to easily transform a deterministic prediction into a probabilistic one. We show that for doing so, one has to compromise between the accuracy and the reliability (calibration) of such a probabilistic model. Hence, we introduce a cost function that encodes their trade-off. We use the Continuous Rank Probability Score to measure accuracy and we derive an analytic formula for the reliability, in the case of forecasts of continuous scalar variables expressed in terms of Gaussian distributions. The new Accuracy-Reliability cost function is then used to estimate the input-dependent variance, given a black-box mean function, by solving a two-objective optimization problem. The simple philosophy behind this strategy is that predictions based on the estimated variances should not only be accurate, but also reliable (i.e. statistical consistent with observations). Conversely, early works based on the minimization of classical cost functions, such as the negative log probability density, cannot simultaneously enforce both accuracy and reliability. We show several examples both with synthetic data, where the underlying hidden noise can accurately be recovered, and with large real-world datasets.
Towards Interpretable Deep Neural Networks: An Exact Transformation to Multi-Class Multivariate Decision Trees
Nguyen, Tung D., Kasmarik, Kathryn E., Abbass, Hussein A.
Deep neural networks (DNNs) are commonly labelled as black-boxes lacking interpretability; thus, hindering human's understanding of DNNs' behaviors. A need exists to generate a meaningful sequential logic for the production of a specific output. Decision trees exhibit better interpretability and expressive power due to their representation language and the existence of efficient algorithms to generate rules. Growing a decision tree based on the available data could produce larger than necessary trees or trees that do not generalise well. In this paper, we introduce two novel multivariate decision tree (MDT) algorithms for rule extraction from a DNN: an Exact-Convertible Decision Tree (EC-DT) and a Deep C-Net algorithm to transform a neural network with Rectified Linear Unit activation functions into a representative tree which can be used to extract multivariate rules for reasoning. While the EC-DT translates the DNN in a layer-wise manner to represent exactly the decision boundaries implicitly learned by the hidden layers of the network, the Deep C-Net inherits the decompositional approach from EC-DT and combines with a C5 tree learning algorithm to construct the decision rules. The results suggest that while EC-DT is superior in preserving the structure and the accuracy of DNN, C-Net generates the most compact and highly effective trees from DNN. Both proposed MDT algorithms generate rules including combinations of multiple attributes for precise interpretation of decision-making processes.
Stable variation in multidimensional competition
The Fundamental Theorem of Language Change (Yang, 2000) implies the impossibility of stable variation in the Variational Learning framework, but only in the special case where two, and not more, grammatical variants compete. Introducing the notion of an advantage matrix, I generalize Variational Learning to situations where the learner receives input generated by more than two grammars, and show that diachronically stable variation is an intrinsic feature of several types of such multiple-grammar systems. This invites experimentalists to take the possibility of stable variation seriously and identifies one possible place where to look for it: situations of complex language contact.