Country
Active learning in the geometric block model
Chien, Eli, Tulino, Antonia Maria, Llorca, Jaime
The geometric block model is a recently proposed generative model for random graphs that is able to capture the inherent geometric properties of many community detection problems, providing more accurate characterizations of practical community structures compared with the popular stochastic block model. Galhotra et al. recently proposed a motif-counting algorithm for unsupervised community detection in the geometric block model that is proved to be near-optimal. They also characterized the regimes of the model parameters for which the proposed algorithm can achieve exact recovery. In this work, we initiate the study of active learning in the geometric block model. That is, we are interested in the problem of exactly recovering the community structure of random graphs following the geometric block model under arbitrary model parameters, by possibly querying the labels of a limited number of chosen nodes. We propose two active learning algorithms that combine the idea of motif-counting with two different label query policies. Our main contribution is to show that sampling the labels of a vanishingly small fraction of nodes (sub-linear in the total number of nodes) is sufficient to achieve exact recovery in the regimes under which the state-of-the-art unsupervised method fails.
Non-Intrusive Load Monitoring with an Attention-based Deep Neural Network
Sudoso, Antonio Maria, Piccialli, Veronica
--Energy disaggregation, also referred to as a Non-Intrusive Load Monitoring (NILM), is the task of using an aggregate energy signal, for example coming from a whole-home power monitor, to make inferences about the different individual loads of the system. In this paper, we present a novel approach based on the encoder-decoder deep learning framework with an attention mechanism for solving NILM. The attention mechanism is inspired by the temporal attention mechanism that has been recently applied to get state-of-the-art results in neural machine translation, text summarization and speech recognition. The experiments have been conducted on two publicly available datasets AMPds and UK-DALE in seen and unseen conditions. The results show that our proposed deep neural network outperforms the state-of-the-art Denoising Auto-Encoder (DAE) proposed initially by Kelly and Knottenbely (2015) and its extended and improved architecture by Bonfigli et al. (2018), in all the addressed experimental conditions. We also show that modeling attention translates into the ability to correctly detect the state change of each appliance, that is of extreme interest in the field of energy disaggregation. Non-Intrusive Load Monitoring (NILM) is the task of estimating the power demand of each appliance given aggregate power demand signal recorded by a single electric meter monitoring multiple appliances [1]. In the last years, the research on NILM has been particularly active in the field of machine learning.
Inferring the Optimal Policy using Markov Chain Monte Carlo
Trabucco, Brandon, Qu, Albert, Li, Simon, Ashokavardhanan, Ganeshkumar
This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of model-free reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.
Temporarily Unavailable: Memory Inhibition in Cognitive and Computer Science
Tempel, Tobias, Niederรฉe, Claudia, Jilek, Christian, Ceroni, Andrea, Maus, Heiko, Runge, Yannick, Frings, Christian
Inhibition can take place at the level of neurotransmitters in the synaptic cleft, neurons can inhibit each other's fire rate, it can be s h own at a physiological level - for instance by measuring the EEG, and finally it can be investigated on a purely behavioral level. Behavioral inhibition typically means something like'making a content/action less accessible or suppressing it altogether' in order to enhance processing of relevant information . In cognition, thus, the concept of inhibition implies cognitive mechanisms that actively lower currently irrelevant or inter fering information. Psychological theories that posit the existence of inhibitory mechanisms in our mind have elicited much research across diverse fields of C ognitive P sychology like perception, attention, action control, and memory but have also been tra nsferred to other research fields like D evelopmental P sychology as, fo r instance, understanding the aging brain or the developing brain is closely linked to understanding how the brain handles irrelevant or interfering information - that is how or whether the brain can inhibit such information. The two areas in Cognitive Psychology in which inhibition is traditionally investigated to the largest extent are the research fields of attention and memory. In attention research, typically the interference due to distracting stimuli or actions is analyzed in experimental paradigms that try to tap a specific form of cognitive inhibition. For example, in the Negative Priming task (for a review, Frings, Schneider, & Fox, 2015) it is typically analyzed how an irrelevant distractor stimulus is inhibited. In the cuing task that elicits the inhibition of return effect (Posner, Choate, Rafal, & Vaughn, 1985) it is typically analyzed how an irrelevant location is inhibited. In task switchin g (Kiesel et al., 2010) lowering competition by a just previously performed task while currently executing a novel task is achieved by inhibiting that previous task.
Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks
Mesnard, Thomas, Vignoud, Gaetan, Sacramento, Joao, Senn, Walter, Bengio, Yoshua
In the past few years, deep learning has transformed artificial intelligence research and led to impressive performance in various difficult tasks. However, it is still unclear how the brain can perform credit assignment across many areas as efficiently as backpropagation does in deep neural networks. In this paper, we introduce a model that relies on a new role for a neuronal inhibitory machinery, referred to as ghost units. By cancelling the feedback coming from the upper layer when no target signal is provided to the top layer, the ghost units enables the network to backpropagate errors and do efficient credit assignment in deep structures. While considering one-compartment neurons and requiring very few biological assumptions, it is able to approximate the error gradient and achieve good performance on classification tasks. Error backpropagation occurs through the recurrent dynamics of the network and thanks to biologically plausible local learning rules. In particular, it does not require separate feedforward and feedback circuits. Different mechanisms for cancelling the feedback were studied, ranging from complete duplication of the connectivity by long term processes to online replication of the feedback activity. This reduced system combines the essential elements to have a working biologically abstracted analogue of backpropagation with a simple formulation and proofs of the associated results. Therefore, this model is a step towards understanding how learning and memory are implemented in cortical multilayer structures, but it also raises interesting perspectives for neuromorphic hardware.
Binary Sine Cosine Algorithms for Feature Selection from Medical Data
Taghian, Shokooh, Nadimi-Shahraki, Mohammad H.
A well-constructed classification model highly depends on input feature subsets from a dataset, which may contain redundant, irrelevant, or noisy features. This challenge can be worse while dealing with medical datasets. The main aim of feature selection as a pre-processing task is to eliminate these features and select the most effective ones. In the literature, metaheuristic algorithms show a successful performance to find optimal feature subsets. In this paper, two binary metaheuristic algorithms named S-shaped binary Sine Cosine Algorithm (SBSCA) and V-shaped binary Sine Cosine Algorithm (VBSCA) are proposed for feature selection from the medical data. In these algorithms, the search space remains continuous, while a binary position vector is generated by two transfer functions S-shaped and V-shaped for each solution. The proposed algorithms are compared with four latest binary optimization algorithms over five medical datasets from the UCI repository. The experimental results confirm that using both bSCA variants enhance the accuracy of classification on these medical datasets compared to four other algorithms.
Deep Discriminative Fine-Tuning for Cancer Type Classification
Determining the primary site of origin for metastatic tumors is one of the open problems in cancer care because the efficacy of treatment often depends on the cancer tissue of origin. Classification methods that can leverage tumor genomic data and predict the site of origin are therefore of great value. Because tumor DNA point mutation data is very sparse, only limited accuracy (64.5% for 12 tumor classes) was previously demonstrated by methods that rely on point mutations as features (1). Tumor classification accuracy can be greatly improved (to over 90% for 33 classes) by relying on gene expression data (2). However, this additional data is often not readily available in clinical setting, because point mutations are better profiled and targeted by clinical mutational profiling. Here we sought to develop an accurate deep transfer learning and fine-tuning method for tumor sub-type classification, where predicted class is indicative of the primary site of origin. Our method significantly outperforms the state-of-the-art for tumor classification using DNA point mutations, reducing the error by more than 30% at the same time discriminating over many more classes on The Cancer Genome Atlas (TCGA) dataset. Using our method, we achieve state-of-the-art tumor type classification accuracy of 78.3% for 29 tumor classes relying on DNA point mutations in the tumor only.
Information-Theoretic Perspective of Federated Learning
Adilova, Linara, Rosenzweig, Julia, Kamp, Michael
An approach to distributed machine learning is to train models on local datasets and aggregate these models into a single, stronger model. A popular instance of this form of parallelization is federated learning, where the nodes periodically send their local models to a coordinator that aggregates them and redistributes the aggregation back to continue training with it. The most frequently used form of aggregation is averaging the model parameters, e.g., the weights of a neural network. However, due to the non-convexity of the loss surface of neural networks, averaging can lead to detrimental effects and it remains an open question under which conditions averaging is beneficial. In this paper, we study this problem from the perspective of information theory: We measure the mutual information between representation and inputs as well as representation and labels in local models and compare it to the respective information contained in the representation of the averaged model. Our empirical results confirm previous observations about the practical usefulness of averaging for neural networks, even if local dataset distributions vary strongly. Furthermore, we obtain more insights about the impact of the aggregation frequency on the information flow and thus on the success of distributed learning. These insights will be helpful both in improving the current synchronization process and in further understanding the effects of model aggregation.
A Molecular-MNIST Dataset for Machine Learning Study on Diffraction Imaging and Microscopy
Zhang, Yan, Farrell, Steve, Crowley, Michael, Makowski, Lee, Deslippe, Jack
These iterative optimization algorithms are computational expensive and difficult to converge. Unlike iterative optimization methods, supervised machine learning using two stage training-testing becomes a great advantage for fast real-time inference since the most expensive computations are performed during training. Deep Learning plays a very important role tackling these type of problems but requires large dataset to train the multi-layer model parameters of the network [1]. Here, we are interested in creating a molecular image dataset including shape images from real space and diffraction patterns from reciprocal space for machine learning practices. We call this dataset Molecular-MNIST because it consists 10 different size of molecules where each molecule has 2,000 structural variants - in an analogy of the famous 10-digit handwritten dataset MNIST [2]. 2. Molecular-MNIST Dataset 2.1.
$DC^2$: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering
Wang, Ke Alexander, Bian, Xinran, Liu, Pan, Yan, Donghui
Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random sampling which may be suboptimal. To address these concerns, we propose the $DC^2$ algorithm. Instead of ensemble instances, we produce structure-preserving signature pieces to be assembled and conquered. $DC^2$ achieves the efficiency of sampling-based large scale kernel methods while enabling parallel multicore or clustered computation. The data partition and subsequent compression are unified by recursive random projections. Empirically dividing the data by random projections induces smaller mean squared approximation errors than conventional random sampling. The power of $DC^2$ is demonstrated by our clustering algorithm $rpfCluster^+$, which is as accurate as some fastest approximate spectral clustering algorithms while maintaining a running time close to that of K-means clustering. Analysis on $DC^2$ when applied to spectral clustering shows that the loss in clustering accuracy due to data division and reduction is upper bounded by the data approximation error which would vanish with recursive random projections. Due to its easy implementation and flexibility, we expect $DC^2$ to be applicable to general large scale learning problems.