Goto

Collaborating Authors

 Country


EdgeNets:Edge Varying Graph Neural Networks

arXiv.org Machine Learning

Driven by the outstanding performance of neural networks in the structured Euclidean domain, recent years have seen a surge of interest in developing neural networks for graphs and data supported on graphs. The graph is leveraged as a parameterization to capture detail at the node level with a reduced number of parameters and complexity. Following this rationale, this paper puts forth a general framework that unifies state-of-the-art graph neural networks (GNNs) through the concept of EdgeNet. An EdgeNet is a GNN architecture that allows different nodes to use different parameters to weigh the information of different neighbors. By extrapolating this strategy to more iterations between neighboring nodes, the EdgeNet learns edge- and neighbor-dependent weights to capture local detail. This is the most general local operation that a node can do and encompasses under one formulation all graph convolutional neural networks (GCNNs) as well as graph attention networks (GATs). In writing different GNN architectures with a common language, EdgeNets highlight specific architecture advantages and limitations, while providing guidelines to improve their capacity without compromising their local implementation. For instance, we show that GCNNs have a parameter sharing structure that induces permutation equivariance. This can be an advantage or a limitation, depending on the application. When it is a limitation, we propose hybrid approaches and provide insights to develop several other solutions that promote parameter sharing without enforcing permutation equivariance. Another interesting conclusion is the unification of GCNNs and GATs -approaches that have been so far perceived as separate. In particular, we show that GATs are GCNNs on a graph that is learned from the features. This particularization opens the doors to develop alternative attention mechanisms for improving discriminatory power.


Simple and Effective Graph Autoencoders with One-Hop Linear Models

arXiv.org Machine Learning

Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged as powerful node embedding methods, with promising performances on challenging tasks such as link prediction and node clustering. Graph AE, VAE and most of their extensions rely on graph convolutional networks (GCN) encoders to learn vector space representations of nodes. In this paper, we propose to replace the GCN encoder by a significantly simpler linear model w.r.t. the direct neighborhood (one-hop) adjacency matrix of the graph. For the two aforementioned tasks, we show that this approach consistently reaches competitive performances w.r.t. GCN-based models for numerous real-world graphs, including all benchmark datasets commonly used to evaluate graph AE and VAE. We question the relevance of repeatedly using these datasets to compare complex graph AE and VAE. We also emphasize the effectiveness of the proposed encoding scheme, that appears as a simpler and faster alternative to GCN encoders for many real-world applications.


Motif Difference Field: A Simple and Effective Image Representation of Time Series for Classification

arXiv.org Machine Learning

Time series motifs play an important role in the time series analysis. The motif-based time series clustering is used for the discovery of higher-order patterns or structures in time series data. Inspired by the convolutional neural network (CNN) classifier based on the image representations of time series, motif difference field (MDF) is proposed. Compared to other image representations of time series, MDF is simple and easy to construct. With the Fully Convolution Network (FCN) as the classifier, MDF demonstrates the superior performance on the UCR time series dataset in benchmark with other time series classification methods. It is interesting to find that the triadic time series motifs give the best result in the test. Due to the motif clustering reflected in MDF, the significant motifs are detected with the help of the Gradient-weighted Class Activation Mapping (Grad-CAM). The areas in MDF with high weight in Grad-CAM have a high contribution from the significant motifs with the desired ordinal patterns associated with the signature patterns in time series. However, the signature patterns cannot be identified with the neural network classifiers directly based on the time series.


R2DE: a NLP approach to estimating IRT parameters of newly generated questions

arXiv.org Machine Learning

The main objective of exams consists in performing an assessment of students' expertise on a specific subject. Such expertise, also referred to as skill or knowledge level, can then be leveraged in different ways (e.g., to assign a grade to the students, to understand whether a student might need some support, etc.). Similarly, the questions appearing in the exams have to be assessed in some way before being used to evaluate students. Standard approaches to questions' assessment are either subjective (e.g., assessment by human experts) or introduce a long delay in the process of question generation (e.g., pretesting with real students). In this work we introduce R2DE (which is a Regressor for Difficulty and Discrimination Estimation), a model capable of assessing newly generated multiple-choice questions by looking at the text of the question and the text of the possible choices. In particular, it can estimate the difficulty and the discrimination of each question, as they are defined in Item Response Theory. We also present the results of extensive experiments we carried out on a real world large scale dataset coming from an e-learning platform, showing that our model can be used to perform an initial assessment of newly created questions and ease some of the problems that arise in question generation.


Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects

arXiv.org Machine Learning

Practitioners in diverse fields such as healthcare, economics and education are eager to apply machine learning to improve decision making. The cost and impracticality of performing experiments and a recent monumental increase in electronic record keeping has brought attention to the problem of evaluating decisions based on non-experimental observational data. This is the setting of this work. In particular, we study estimation of individual-level causal effects, such as a single patient's response to alternative medication, from recorded contexts, decisions and outcomes. We give generalization bounds on the error in estimated effects based on distance measures between groups receiving different treatments, allowing for sample re-weighting. We provide conditions under which our bound is tight and show how it relates to results for unsupervised domain adaptation. Led by our theoretical results, we devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance, and encourage sharing of information between treatment groups. We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances. Finally, an experimental evaluation on real and synthetic data shows the value of our proposed representation architecture and regularization scheme.


Explicit agreement extremes for a $2\times2$ table with given marginals

arXiv.org Machine Learning

Given two different clusterings of a data set, many measures ha ve been proposed to quantify their degree of concordance. A recent review of a representa tive number of them can be found in Meil a (2016). These measures are usually categori zed into three classes: those based on inspecting the assignments of data pairs in both clu sterings, those involving some cluster matching between the two clusterings, and those rel ying on information theoretic criteria. This paper concerns the first one of these classes. In fact, some of the most popular and widely used similarity measures, such as the Rand ind ex, the Jaccard index, or the Fowlkes-Mallows index, belong to this class of pair-based s imilarities, but it should be noted that there is a plethora of them, as explored in Albatineh, Niewiadomska-Bugaj and Mihalko (2006), Warrens (2008) or Warrens and van der Hoef (2019).


Estimating Latent Demand of Shared Mobility through Censored Gaussian Processes

arXiv.org Machine Learning

Transport demand is highly dependent on supply, especially for shared transport services where availability is often limited. As observed demand cannot be higher than available supply, historical transport data typically represents a biased, or censored, version of the true underlying demand pattern. Without explicitly accounting for this inherent distinction, predictive models of demand would necessarily represent a biased version of true demand, thus less effectively predicting the needs of service users. To counter this problem, we propose a general method for censorship-aware demand modeling, for which we devise a censored likelihood function. We apply this method to the task of shared mobility demand prediction by incorporating the censored likelihood within a Gaussian Process model, which can flexibly approximate arbitrary functional forms. Experiments on artificial and real-world datasets show how taking into account the limiting effect of supply on demand is essential in the process of obtaining an unbiased predictive model of user demand behavior.


Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

arXiv.org Machine Learning

GSNR of a parameter is defined as the ratio between its gradient's squared mean and Previous work (Zhang et al., 2016; Hardt et al., 2015; Dziugaite & Roy, 2017) suggests that the The GSNR of a parameter is defined as the ratio between its gradient's squared mean and variance Previous work tried to use GSNR to conduct theoretical analysis on deep learning. For example, Rainforth et al. (2018) used GSNR to analyze variational bounds in Intuitively, GSNR measures the similarity of a parameter's gradients among different training samples. To reveal the mechanism of DNNs' good generalization ability, we show that the gradient descent We believe this is probably the key to DNNs' remarkable generalization ability. In the remainder of this paper we first analyze the relation between GSNR and generalization (Section 2). At a particular point of the parameter space, GSNR measures the consistency of a parameter's gradients across different data samples.


ShapeVis: High-dimensional Data Visualization at Scale

arXiv.org Machine Learning

We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. However, when using standard dimensionality reduction algorithms as the filter function, Mapper suffers from considerable computational cost. This makes it difficult to scale to high-dimensional data. Our proposed technique relies on finding a subset of points called landmarks along the data manifold to construct a weighted witness-graph over it. This graph captures the structural characteristics of the point cloud, and its weights are determined using a Finite Markov Chain. We further compress this graph by applying induced maps from standard community detection algorithms. Using techniques borrowed from manifold tearing, we prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. We empirically demonstrate how our technique captures the structural characteristics of real and synthetic data sets. Further, we compare our approach with Mapper using various filter functions like t-SNE, UMAP, LargeVis and show that our algorithm scales to millions of data points while preserving the quality of data visualization.


Recognizing Images with at most one Spike per Neuron

arXiv.org Machine Learning

In order to port the performance of trained artificial neural networks (ANNs) to spiking neural networks (SNNs), which can be implemented in neuromorphic hardware with a drastically reduced energy consumption, an efficient ANN to SNN conversion is needed. Previous conversion schemes focused on the representation of the analog output of a rectified linear (ReLU) gate in the ANN by the firing rate of a spiking neuron. But this is not possible for other commonly used ANN gates, and it reduces the throughput even for ReLU gates. We introduce a new conversion method where a gate in the ANN, which can basically be of any type, is emulated by a small circuit of spiking neurons, with At Most One Spike (AMOS) per neuron. We show that this AMOS conversion improves the accuracy of SNNs for ImageNet from 74.60% to 80.97%, thereby bringing it within reach of the best available ANN accuracy (85.0%). The Top5 accuracy of SNNs is raised to 95.82%, getting even closer to the best Top5 performance of 97.2% for ANNs. In addition, AMOS conversion improves latency and throughput of spike-based image classification by several orders of magnitude. Hence these results suggest that SNNs provide a viable direction for developing highly energy efficient hardware for AI that combines high performance with versatility of applications.