AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Energy Efficiency in Reinforcement Learning for Wireless Sensor Networks

Kozlowski, Michal, McConville, Ryan, Santos-Rodriguez, Raul, Piechocki, Robert

arXiv.org Machine LearningNov-19-2018

As sensor networks for health monitoring become more prevalent, so will the need to control their usage and consumption of energy. This paper presents a method which leverages the algorithm's performance and energy consumption. By utilising Reinforcement Learning (RL) techniques, we provide an adaptive framework, which continuously performs weak training in an energy-aware system. We motivate this using a realistic example of residential localisation based on Received Signal Strength (RSS). The method is cheap in terms of work-hours, calibration and energy usage. It achieves this by utilising other sensors available in the environment. These other sensors provide weak labels, which are then used to employ the State-Action-Reward-State-Action (SARSA) algorithm and train the model over time. Our approach is evaluated on a simulated localisation environment and validated on a widely available pervasive health dataset which facilitates realistic residential localisation using RSS. We show that our method is cheaper to implement and requires less effort, whilst at the same time providing a performance enhancement and energy savings over time.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1812.02538

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.88)

Technology:

Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Efficient keyword spotting using dilated convolutions and gating

Coucke, Alice, Chlieh, Mohammed, Gisselbrecht, Thibault, Leroy, David, Poumeyrol, Mathieu, Lavril, Thibaut

arXiv.org Machine LearningNov-19-2018

We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset - "Hey Snips" utterances recorded by over 2.2K different speakers - has been made publicly available to establish an open reference for wake-word detection.

artificial intelligence, keyword, machine learning, (20 more...)

arXiv.org Machine Learning

1811.07684

Country: Europe > France (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

Scalable agent alignment via reward modeling: a research direction

Leike, Jan, Krueger, David, Everitt, Tom, Martic, Miljan, Maini, Vishal, Legg, Shane

arXiv.org Artificial IntelligenceNov-19-2018

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions? We outline a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning. We discuss the key challenges we expect to face when scaling reward modeling to complex and general domains, concrete approaches to mitigate these challenges, and ways to establish trust in the resulting agents.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

1811.07871

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > Promising Solution (0.45)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Robust cross-domain disfluency detection with pattern match networks

Zayats, Vicky, Ostendorf, Mari

arXiv.org Artificial IntelligenceNov-17-2018

In this paper we introduce a novel pattern match neural network architecture that uses neighbor similarity scores as features, eliminating the need for feature engineering in a disfluency detection task. We evaluate the approach in disfluency detection for four different speech genres, showing that the approach is as effective as hand-engineered pattern match features when used on in-domain data and achieves superior performance in cross-domain scenarios.

artificial intelligence, disfluency detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1811.07236

Genre: Research Report (0.50)

Industry: Law (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Subtask Gated Networks for Non-Intrusive Load Monitoring

Shin, Changho, Joo, Sunghwan, Yim, Jaeryun, Lee, Hyoseop, Moon, Taesup, Rhee, Wonjong

arXiv.org Machine LearningNov-16-2018

Non-intrusive load monitoring (NILM), also known as energy disaggregation, is a blind source separation problem where a household's aggregate electricity consumption is broken down into electricity usages of individual appliances. In this way, the cost and trouble of installing many measurement devices over numerous household appliances can be avoided, and only one device needs to be installed. The problem has been well-known since Hart's seminal paper in 1992, and recently significant performance improvements have been achieved by adopting deep networks. In this work, we focus on the idea that appliances have on/off states, and develop a deep network for further performance improvements. Specifically, we propose a subtask gated network that combines the main regression network with an on/off classification subtask network. Unlike typical multitask learning algorithms where multiple tasks simply share the network parameters to take advantage of the relevance among tasks, the subtask gated network multiply the main network's regression output with the subtask's classification probability. When standby-power is additionally learned, the proposed solution surpasses the state-of-the-art performance for most of the benchmark cases. The subtask gated network can be very effective for any problem that inherently has on/off states.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1811.06692

Country: Asia > South Korea (0.14)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Deep Knockoffs

Romano, Yaniv, Sesia, Matteo, Candès, Emmanuel J.

arXiv.org Machine LearningNov-16-2018

This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion measuring the validity of the produced knockoffs is optimized; this criterion is inspired by the popular maximum mean discrepancy in machine learning and can be thought of as measuring the distance to pairwise exchangeability between original and knockoff features. By building upon the existing model-X framework, we thus obtain a flexible and model-free statistical tool to perform controlled variable selection. Extensive numerical experiments and quantitative tests confirm the generality, effectiveness, and power of our deep knockoff machines. Finally, we apply this new method to a real study of mutations linked to changes in drug resistance in the human immunodeficiency virus.

artificial intelligence, knockoff, machine learning, (14 more...)

arXiv.org Machine Learning

1811.06687

Country: North America > United States > California (0.93)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Liu, Bo, Gemp, Ian, Ghavamzadeh, Mohammad, Liu, Ji, Mahadevan, Sridhar, Petrik, Marek

Journal of Artificial Intelligence ResearchNov-15-2018

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal "mirror maps" to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

algorithm, finite-sample analysis, objective function, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11251

AI Access Foundation

11260

Journal of Artificial Intelligence Research

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Neural Predictive Belief Representations

Guo, Zhaohan Daniel, Azar, Mohammad Gheshlaghi, Piot, Bilal, Pires, Bernardo A., Pohlen, Toby, Munos, Rémi

arXiv.org Machine LearningNov-15-2018

Unsupervised representation learning has succeeded with excellent results in many applications. It is an especially powerful tool to learn a good representation of environments with partial or noisy observations. In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far. In this paper, we investigate whether it is possible to learn such a belief representation using modern neural architectures. Specifically, we focus on one-step frame prediction and two variants of contrastive predictive coding (CPC) as the objective functions to learn the representations. To evaluate these learned representations, we test how well they can predict various pieces of information about the underlying state of the environment, e.g., position of the agent in a 3D maze. We show that all three methods are able to learn belief representations of the environment, they encode not only the state information, but also its uncertainty, a crucial aspect of belief states. We also find that for CPC multi-step predictions and action-conditioning are critical for accurate belief representations in visually complex environments. The ability of neural representations to capture the belief information has the potential to spur new advances for learning and planning in partially observable domains, where leveraging uncertainty is essential for optimal decision making.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Machine Learning

1811.06407

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Buesing, Lars, Weber, Theophane, Zwols, Yori, Racaniere, Sebastien, Guez, Arthur, Lespiau, Jean-Baptiste, Heess, Nicolas

arXiv.org Machine LearningNov-15-2018

Learning policies on data synthesized by models can in principle quench the thirst of reinforcement learning algorithms for large amounts of real experience, which is often costly to acquire. However, simulating plausible experience de novo is a hard problem for many complex environments, often resulting in biases for model-based policy evaluation and search. Instead of de novo synthesis of data, here we assume logged, real experience and model alternative outcomes of this experience under counterfactual actions, i.e. actions that were not actually taken. Based on this, we propose the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience. CF-GPS can improve on vanilla model-based RL algorithms by making use of available logged data to de-bias model predictions. In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data. We find empirically that these advantages translate into improved policy evaluation and search results on a nontrivial grid-world task. Finally, we show that CF-GPS generalizes the previously proposed Guided Policy Search and that reparameterization-based algorithms such Stochastic V alue Gradient can be interpreted as counterfactual methods. This example tries to illustrate the everyday human capacity to reason about alternate, counterfactual outcomes of past experience with the goal of "mining worlds that could have been" (Pearl & Mackenzie, 2018). Social psychologists theorize that such cognitive processes are beneficial for improving future decision making (Roese, 1997). In this paper we aim to leverage possible advantages of counterfactual reasoning for learning decision making in the reinforcement learning (RL) framework. In spite of recent success, learning policies with standard, model-free RL algorithms can be notoriously data inefficient. This issue can in principle be addressed by learning policies on data synthesized from a model.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1811.06272

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models

Tsuzuki, Shunsuke, Nishiyama, Yu

arXiv.org Machine LearningNov-15-2018

In machine learning, a nonparametric forecasting algorithm for time series data has been proposed, called the kernel spectral hidden Markov model (KSHMM). In this paper, we propose a technique for short-term wind-speed prediction based on KSHMM. We numerically compared the performance of our KSHMMbased forecasting technique to other techniques with machine learning, using wind-speed data offered by the National Renewable Energy Laboratory. Our results demonstrate that, compared to these methods, the proposed technique offers comparable or better performance. Keywords: Wind-Speed Prediction, Kernel Methods, Kernel Mean Embedding, Spectral Learning, Hidden Markov Models. 1. Introduction Wind energy is one of the most attractive renewable energy sources.

artificial intelligence, forecasting, machine learning, (14 more...)

arXiv.org Machine Learning

1811.0621

Country:

North America (0.28)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Energy > Renewable > Wind (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback