Undirected Networks
Unsupervised learning explained
Despite the success of supervised machine learning and deep learning, there's a school of thought that says that unsupervised learning has even greater potential. The learning of a supervised learning system is limited by its training; i.e., a supervised learning system can learn only those tasks that it's trained for. By contrast, an unsupervised system could theoretically achieve "artificial general intelligence," meaning the ability to learn any task a human can learn. If the biggest problem with supervised learning is the expense of labeling the training data, the biggest problem with unsupervised learning (where the data is not labeled) is that it often doesn't work very well. Nevertheless, unsupervised learning does have its uses: It can sometimes be good for reducing the dimensionality of a data set, exploring the pattern and structure of the data, finding groups of similar objects, and detecting outliers and other noise in the data.
NAT: Neural Architecture Transformer for Accurate and Compact Architectures
Guo, Yong, Zheng, Yin, Tan, Mingkui, Chen, Qi, Chen, Jian, Zhao, Peilin, Huang, Junzhou
Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.
Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning
Yu, Runsheng, Shi, Zhenyu, Wang, Xinrun, Wang, Rundong, Liu, Buhong, Hou, Xinwen, Lai, Hanjiang, An, Bo
Existing value-factorized based Multi-Agent deep Reinforce-ment Learning (MARL) approaches are well-performing invarious multi-agent cooperative environment under thecen-tralized training and decentralized execution(CTDE) scheme,where all agents are trained together by the centralized valuenetwork and each agent execute its policy independently. How-ever, an issue remains open: in the centralized training process,when the environment for the team is partially observable ornon-stationary, i.e., the observation and action informationof all the agents cannot represent the global states, existingmethods perform poorly and sample inefficiently. Regret Min-imization (RM) can be a promising approach as it performswell in partially observable and fully competitive settings.However, it tends to model others as opponents and thus can-not work well under the CTDE scheme. In this work, wepropose a novel team RM based Bayesian MARL with threekey contributions: (a) we design a novel RM method to traincooperative agents as a team and obtain a team regret-basedpolicy for that team; (b) we introduce a novel method to de-compose the team regret to generate the policy for each agentfor decentralized execution; (c) to further improve the perfor-mance, we leverage a differential particle filter (a SequentialMonte Carlo method) network to get an accurate estimation ofthe state for each agent. Experimental results on two-step ma-trix games (cooperative game) and battle games (large-scalemixed cooperative-competitive games) demonstrate that ouralgorithm significantly outperforms state-of-the-art methods.
Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation
Cameron, Scott A., Eggers, Hans C., Kroon, Steve
We consider estimating the marginal likelihood in settings with independent and identically distributed (i.i.d.) data. We propose estimating the predictive distributions in a sequential factorization of the marginal likelihood in such settings by using stochastic gradient Markov Chain Monte Carlo techniques. This approach is far more efficient than traditional marginal likelihood estimation techniques such as nested sampling and annealed importance sampling due to its use of mini-batches to approximate the likelihood. Stability of the estimates is provided by an adaptive annealing schedule. The resulting stochastic gradient annealed importance sampling (SGAIS) technique, which is the key contribution of our paper, enables us to estimate the marginal likelihood of a number of models considerably faster than traditional approaches, with no noticeable loss of accuracy. An important benefit of our approach is that the marginal likelihood is calculated in an online fashion as data becomes available, allowing the estimates to be used for applications such as online weighted model combination.
Hebbian Synaptic Modifications in Spiking Neurons that Learn
Bartlett, Peter L., Baxter, Jonathan
In this paper, we derive a new model of synaptic plasticity, b ased on recent algorithms for reinforcement learning (in which an age nt attempts to learn appropriate actions to maximize its long-term averag e reward). We show that these direct reinforcement learning algorithms a lso give locally optimal performance for the problem of reinforcement learn ing with multiple agents, without any explicit communication between a gents. By considering a network of spiking neurons as a collection of agen ts attempting to maximize the long-term average of a reward signal, we deri ve a synaptic update rule that is qualitatively similar to Hebb's post ulate. This rule requires only simple computations, such as addition and lea ky integration, and involves only quantities that are available in the vicin ity of the synapse. Furthermore, it leads to synaptic connection strengths tha t give locally optimal values of the long term average reward. The reinforcem ent learning paradigm is sufficiently broad to encompass many learning pr oblems that are solved by the brain. We illustrate, with simulations, th at the approach is effective for simple pattern classification and motor learn ing tasks. It is widely accepted that the functions performed by neural circuits are modified by adjustments to the strength of the synaptic connectio ns between neurons. 1 In the 1940s, Donald Hebb speculated that such adjustments a re associated with simultaneous (or nearly simultaneous) firing of the presyna ptic and postsynaptic neurons [14]: When an axon of cell A ... persistently takes part in firing [cell B ], some growth process or metabolic change takes place [to incr ease] A's efficacy as one of the cells firing B .
Scale- and Context-Aware Convolutional Non-intrusive Load Monitoring
Chen, Kunjin, Zhang, Yu, Wang, Qin, Hu, Jun, Fan, Hang, He, Jinliang
Personal use of this material is permitted. Abstract--Non-intrusive load monitoring addresses the challenging task of decomposing the aggregate signal of a household's electricity consumption into appliance-level data without installing dedicated meters. By detecting load malfunctio n and recommending energy reduction programs, cost-effective n on-intrusive load monitoring provides intelligent demand-si de management for utilities and end users. In this paper, we boost the accuracy of energy disaggregation with a novel neural network structure named scale-and context-aware network, which exploits multi-scale features and contextual inform ation. Specifically, we develop a multi-branch architecture with m ultiple receptive field sizes and branch-wise gates that connect the branches in the sub-networks. We build a self-attention mod ule to facilitate the integration of global context, and we inco rporate an adversarial loss and on-state augmentation to further im prove the model's performance. Extensive simulation results tes ted on open datasets corroborate the merits of the proposed approa ch, which significantly outperforms state-of-the-art methods . Non-intrusive load monitoring (NILM) is the task of estimating the power demand of a specific appliance from the aggregate consumption of a household measured by a single meter [1]. As the task requires breaking down the total energ y consumed by multiple appliances into appliance-level ener gy consumption records, NILM is synonymous with the phrase "energy disaggregation" [2]. A direct benefit of NILM is that energy end-users can acquire appliance-level consump tion feedbacks and optimize their energy consumption behaviour s accordingly. It is estimated that up to 12% residential ener gy saving can be achieved by providing appliance-level feedba ck [3].
Learning Behavioral Representations from Wearable Sensors
Tavabi, Nazgol, Hosseinmardi, Homa, Villatte, Jennifer L., Abeliuk, Andrรฉs, Narayanan, Shrikanth, Ferrara, Emilio, Lerman, Kristina
The ubiquity of mobile devices and wearable sensors offers unprecedented opportunities for continuous collection of multimodal physiological data. Such data enables temporal characterization of an individual's behaviors, which can provide unique insights into her physical and psychological health. Understanding the relation between different behaviors/activities and personality traits such as stress or work performance can help build strategies to improve the work environment. Especially in workplaces like hospitals where many employees are overworked, having such policies improves the quality of patient care by prioritizing mental and physical health of their caregivers. One challenge in analyzing physiological data is extracting the underlying behavioral states from the temporal sensor signals and interpreting them. Here, we use a non-parametric Bayesian approach, to model multivariate sensor data from multiple people and discover dynamic behaviors they share. We apply this method to data collected from sensors worn by a population of workers in a large urban hospital, capturing their physiological signals, such as breathing and heart rate, and activity patterns. We show that the learned states capture behavioral differences within the population that can help cluster participants into meaningful groups and better predict their cognitive and affective states. This method offers a practical way to learn compact behavioral representations from dynamic multivariate sensor signals and provide insights into the data.
Working Memory Graphs
Loynd, Ricky, Fernandez, Roland, Celikyilmaz, Asli, Swaminathan, Adith, Hausknecht, Matthew
A BSTRACT Transformers have increasingly outperformed gated RNNs in obtaining new state-of-the-art results on supervised tasks involving text sequences. Inspired by this trend, we study the question of how Transformer-based models can improve the performance of sequential decision-making agents. We present the Working Memory Graph (WMG), an agent that employs multi-head self-attention to reason over a dynamic set of vectors representing observed and recurrent state. We evaluate WMG in two partially observable environments, one that requires complex reasoning over past observations, and another that features factored observations. We find that WMG significantly outperforms gated RNNs on these tasks, supporting the hypothesis that WMG's inductive bias in favor of learning and leveraging factored representations can dramatically boost sample efficiency in environments featuring such structure. In the RNN-based approach of Sutskever et al. (2014), an encoder RNN maps an input sentence to a series of internal hidden state vectors. The encoder's final hidden state is copied into a decoder RNN, which then generates another sequence of hidden states that determine the selection of output tokens in the target language. This model can be trained to translate sentences, but translation quality deteriorates on long sentences where long-term dependencies become critical.
Non-Intrusive Load Monitoring with an Attention-based Deep Neural Network
Sudoso, Antonio Maria, Piccialli, Veronica
--Energy disaggregation, also referred to as a Non-Intrusive Load Monitoring (NILM), is the task of using an aggregate energy signal, for example coming from a whole-home power monitor, to make inferences about the different individual loads of the system. In this paper, we present a novel approach based on the encoder-decoder deep learning framework with an attention mechanism for solving NILM. The attention mechanism is inspired by the temporal attention mechanism that has been recently applied to get state-of-the-art results in neural machine translation, text summarization and speech recognition. The experiments have been conducted on two publicly available datasets AMPds and UK-DALE in seen and unseen conditions. The results show that our proposed deep neural network outperforms the state-of-the-art Denoising Auto-Encoder (DAE) proposed initially by Kelly and Knottenbely (2015) and its extended and improved architecture by Bonfigli et al. (2018), in all the addressed experimental conditions. We also show that modeling attention translates into the ability to correctly detect the state change of each appliance, that is of extreme interest in the field of energy disaggregation. Non-Intrusive Load Monitoring (NILM) is the task of estimating the power demand of each appliance given aggregate power demand signal recorded by a single electric meter monitoring multiple appliances [1]. In the last years, the research on NILM has been particularly active in the field of machine learning.
Inferring the Optimal Policy using Markov Chain Monte Carlo
Trabucco, Brandon, Qu, Albert, Li, Simon, Ashokavardhanan, Ganeshkumar
This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of model-free reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.