Goto

Collaborating Authors

 Country


A Bayesian Dynamic Multilayered Block Network Model

arXiv.org Machine Learning

As network data become increasingly available, new opportunities arise to understand dynamic and multilayer network systems in many applied disciplines. Statistical modeling for multilayer networks is currently an active research area that aims to develop methods to carry out inference on such data. Recent contributions focus on latent space representation of the multilayer structure with underlying stochastic processes to account for network dynamics. Existing multilayer models are however typically limited to rather small networks. In this paper we introduce a dynamic multilayer block network model with a latent space represention for blocks rather than nodes. A block structure is natural for many real networks, such as social or transportation networks, where community structure naturally arises. A Gibbs sampler based on P\'olya-Gamma data augmentation is presented for the proposed model. Results from extensive simulations on synthetic data show that the inference algorithm scales well with the size of the network. We present a case study using real data from an airline system, a classic example of hub-and-spoke network.


Method and Dataset Mining in Scientific Papers

arXiv.org Machine Learning

Literature analysis facilitates researchers better understanding the development of science and technology. The conventional literature analysis focuses on the topics, authors, abstracts, keywords, references, etc., and rarely pays attention to the content of papers. In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers. The extraction and mining of M and D are useful for discipline analysis and algorithm recommendation. In this paper, we propose a novel entity recognition model, called MDER, and constructe datasets from the papers of the PAKDD conferences (2009-2019). Some preliminary experiments are conducted to assess the extraction performance and the mining results are visualized.


Sparsely Grouped Input Variables for Neural Networks

arXiv.org Machine Learning

In genomic analysis, biomarker discovery, image recognition, and other systems involving machine learning, input variables can often be organized into different groups by their source or semantic category. Eliminating some groups of variables can expedite the process of data acquisition and avoid over-fitting. Researchers have used the group lasso to ensure group sparsity in linear models and have extended it to create compact neural networks in meta-learning. Different from previous studies, we use multi-layer non-linear neural networks to find sparse groups for input variables. We propose a new loss function to regularize parameters for grouped input variables, design a new optimization algorithm for this loss function, and test these methods in three real-world settings. We achieve group sparsity for three datasets, maintaining satisfying results while excluding one nucleotide position from an RNA splicing experiment, excluding 89.9% of stimuli from an eye-tracking experiment, and excluding 60% of image rows from an experiment on the MNIST dataset.


Detecting anthropogenic cloud perturbations with deep learning

arXiv.org Machine Learning

One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth's energy balance. Aerosols provide the `seeds' on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global temperatures and small perturbations can lead to significant amounts of cooling or warming. Uncertainty in this effect is so large it is not currently known if it is negligible, or provides a large enough cooling to largely negate present-day warming by CO2. This work uses deep convolutional neural networks to look for two particular perturbations in clouds due to anthropogenic aerosol and assess their properties and prevalence, providing valuable insights into their climatic effects.


Orthogonal Wasserstein GANs

arXiv.org Machine Learning

Wasserstein-GANs have been introduced to address the deficiencies of generative adversarial networks (GANs) regarding the problems of vanishing gradients and mode collapse during the training, leading to improved convergence behaviour and improved image quality. However, Wasserstein-GANs require the discriminator to be Lipschitz continuous. In current state-of-the-art Wasserstein-GANs this constraint is enforced via gradient norm regularization. In this paper, we demonstrate that this regularization does not encourage a broad distribution of spectral-values in the discriminator weights, hence resulting in less fidelity in the learned distribution. We therefore investigate the possibility of substituting this Lipschitz constraint with an orthogonality constraint on the weight matrices. We compare three different weight orthogonalization techniques with regards to their convergence properties, their ability to ensure the Lipschitz condition and the achieved quality of the learned distribution. In addition, we provide a comparison to Wasserstein-GANs trained with current state-of-the-art methods, where we demonstrate the potential of solely using orthogonality-based regularization. In this context, we propose an improved training procedure for Wasserstein-GANs which utilizes orthogonalization to further increase its generalization capability. Finally, we provide a novel metric to evaluate the generalization capabilities of the discriminators of different Wasserstein-GANs.


Deep Learning to Scale up Time Series Traffic Prediction

arXiv.org Machine Learning

--The transport literature is dense regarding short-term traffic predictions, up to the scale of 1 hour, yet less dense for long-term traffic predictions. The transport literature is also sparse when it comes to city-scale traffic predictions, mainly because of low data availability. The main question we try to answer in this work is to which extent the approaches used for short-term prediction at a link level can be scaled up for long-term prediction at a city scale. We investigate a city-scale traffic dataset with 14 weeks of speed observations collected every 15 minutes over 1098 segments in the hypercenter of Los Angeles, California. We look at a variety of machine learning and deep learning predictors for link-based predictions, and investigate ways to make such predictors scale up for larger areas, with brute force, clustering, and model design approaches. In particular we propose a novel deep learning spatiotemporal predictor inspired from recent works on recommender systems. We discuss the potential of including spatiotemporal features into the predictors, and conclude that modelling such features can be helpful for long-term predictions, while simpler predictors achieve very satisfactory performance for link-based and short-term forecasting. The tradeoff is discussed not only in terms of prediction accuracy vs prediction horizon but also in terms of training time and model sizing. Traffic prediction in urban transport networks is a central task for the real-time operation of transportation systems, such as route planning, route guidance, on-demand mobility services Simonetto et al. (2019). In principle this task can be achieved with the help of an increasing large volume of observed traffic data that can be made available through, e.g., on-road sensors, GPS data, cameras, social media Zhu et al. (2019). In reality, the access to such data is limited as big traffic data sets are generally owned by specific companies and deemed as proprietary information and a valuable source of business.


Deep Networks with Adaptive Nystr\"om Approximation

arXiv.org Machine Learning

Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nystr{\"o}m approximation. Our approach is easy and highly flexible. It is compatible with any kernel function and it allows exploiting multiple kernels. We show that our architecture has the same performance than standard architecture on datasets like SVHN and CIFAR100. One benefit of the method lies in its limited number of learnable parameters which makes it particularly suited for small training set sizes, e.g. from 5 to 20 samples per class.


Towards Oracle Knowledge Distillation with Neural Architecture Search

arXiv.org Machine Learning

We present a novel framework of knowledge distillation that is capable of learning powerful and efficient student models from ensemble teacher networks. Our approach addresses the inherent model capacity issue between teacher and student and aims to maximize benefit from teacher models during distillation by reducing their capacity gap. Specifically, we employ a neural architecture search technique to augment useful structures and operations, where the searched network is appropriate for knowledge distillation towards student models and free from sacrificing its performance by fixing the network capacity. We also introduce an oracle knowledge distillation loss to facilitate model search and distillation using an ensemble-based teacher model, where a student network is learned to imitate oracle performance of the teacher. We perform extensive experiments on the image classification datasets---CIFAR-100 and TinyImageNet---using various networks. We also show that searching for a new student model is effective in both accuracy and memory size and that the searched models often outperform their teacher models thanks to neural architecture search with oracle knowledge distillation.


Spike-and-wave epileptiform discharge pattern detection based on Kendall's Tau-b coefficient

arXiv.org Machine Learning

Epilepsy is a n important public health issue. An appropriate epileptiform discharge pattern detectio n of this neurological disease is a typical problem in biomedical engineering. In this paper, a new method is proposed for spike - and - wave discharge pattern dete ction based on Kendall's Tau - b c oefficient. The proposed approach is demonstrated on a real data set containing spike - and - wave discharge signals, where our performance is evaluated in terms of high Specificity, rule in (SpPIn) with 94% for patient - specific spike - and - wave discharge detection and 83% for a general spike - and - wave discharge detection. Key words: Spike - and - wave discharge; Kendall's Tau - b c oefficient; Electroencephalography ( EEG); Epilepsy; high Specificity, rule in ( SpPIn) Introduction Electroencephalography (EEG) is widely used to record the electrical activity of the brain in neurological health centers.


Sparse and Low-Rank Tensor Regression via Parallel Proximal Method

arXiv.org Machine Learning

Motivated by applications in various scientific fields having demand of predicting relationship between higher-order (tensor) feature and univariate response, we propose a \underline{S}parse and \underline{L}ow-rank \underline{T}ensor \underline{R}egression model (SLTR). This model enforces sparsity and low-rankness of the tensor coefficient by directly applying $\ell_1$ norm and tensor nuclear norm on it respectively, such that (1) the structural information of tensor is preserved and (2) the data interpretation is convenient. To make the solving procedure scalable and efficient, SLTR makes use of the proximal gradient method to optimize two norm regularizers, which can be easily implemented parallelly. Additionally, a tighter convergence rate is proved over three-order tensor data. We evaluate SLTR on several simulated datasets and one fMRI dataset. Experiment results show that, compared with previous models, SLTR is able to obtain a solution no worse than others with much less time cost.