Goto

Collaborating Authors

 Markov Models


Target Network and Truncation Overcome The Deadly Triad in $Q$-Learning

arXiv.org Machine Learning

The Deep Q -Network (Mnih et al., 2015), as a typical example of Q -learning with function approximation, is one of the most successful algorithms to solve the reinforcement learning (RL) problem, and hence is viewed as a milestone in the development of modern RL. On the other hand, the behavior of Q -learning with function approximation is theoretically not well understood, and was identified in Sutton (1999) as one of four most important theoretical open problems. In fact, the infamous deadly triad (Sutton, 2015) is present in Q -learning with function approximation, and hence even in the basic setting where linear function approximation is used, the algorithm was shown to be unstable in general (Baird, 1995). While theoretically unclear, it was empirically evident from Mnih et al. (2015) that the following three ingredients: experience replay, target network, and truncation together overcome the divergence of Q - learning with function approximation. In this work, we focus on Q -learning with linear function approximation for infinite horizon discounted Markov decision processes (MDPs), and show theoretically that target network together with truncation is sufficient to provably stabilize Q -learning. The main contributions of this work are summarized in the following.


Sequential Decision Making - an overview

#artificialintelligence

Central to many formulations of sequence recognition are problems in sequential decision-making. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations, and based on these observations, the recognition process produces a hypothesis of the underlying events. The events in the underlying process are constrained to follow a certain loose order, for example by a grammar, so that decisions made early in the recognition process restrict or narrow the choices that can be made later. This problem is well known and leads to the use of dynamic programming (DP) algorithms [Bel57] so that unalterable decisions can be avoided until all available information has been processed. DP strategies are central to hidden Markov model (HMM) recognizers [LMS84,Lev85,Rab89,RBH86] and have also been widely used in systems based on neural networks (e.g., [SIY 89,Bur88,BW89,SL92,BM90,FLW90]) to transform static pattern classifiers into sequence recognizers.


Creative Problem Solving in Artificially Intelligent Agents: A Survey and Framework

arXiv.org Artificial Intelligence

Creative Problem Solving (CPS) is a sub-area within Artificial Intelligence (AI) that focuses on methods for solving off-nominal, or anomalous problems in autonomous systems. Despite many advancements in planning and learning, resolving novel problems or adapting existing knowledge to a new context, especially in cases where the environment may change in unpredictable ways post deployment, remains a limiting factor in the safe and useful integration of intelligent systems. The emergence of increasingly autonomous systems dictates the necessity for AI agents to deal with environmental uncertainty through creativity. To stimulate further research in CPS, we present a definition and a framework of CPS, which we adopt to categorize existing AI methods in this field. Our framework consists of four main components of a CPS problem, namely, 1) problem formulation, 2) knowledge representation, 3) method of knowledge manipulation, and 4) method of evaluation. We conclude our survey with open research questions, and suggested directions for the future.


Deep Learning: Recurrent Neural Networks in Python

#artificialintelligence

The Recurrent Neural Network (RNN) has been used to obtain state-of-the-art results in sequence modeling. This includes time series analysis, forecasting and natural language processing (NLP). Learn about why RNNs beat old-school machine learning algorithms like Hidden Markov Models. All of the materials required for this course can be downloaded and installed for FREE. We will do most of our work in Numpy, Matplotlib, and Tensorflow.


A System for Interactive Examination of Learned Security Policies

arXiv.org Artificial Intelligence

We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure's state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.


Director, Artificial Intelligence (AI) & Machine Learning (ML)

#artificialintelligence

This Director of AI & ML will be responsible for developing new models and systems to support Key Capture Energy's (KCE) battery storage facilities, as well as work closely with our software development and market operations analytics team to deploy models to production systems and utilize large-scale datasets for model development and optimization. Prior roles should include significant hands-on experience with typical AI/ML tasks such as feature engineering, feature selection, and hyperparameter tuning.


Program Analysis of Probabilistic Programs

arXiv.org Machine Learning

Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.


Text Generation with Markov Decision Processes

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.


A quantum generative model for multi-dimensional time series using Hamiltonian learning

arXiv.org Machine Learning

Synthetic data generation has proven to be a promising solution for addressing data availability issues in various domains. Even more challenging is the generation of synthetic time series data, where one has to preserve temporal dynamics, i.e., the generated time series must respect the original relationships between variables across time. Recently proposed techniques such as generative adversarial networks (GANs) and quantum-GANs lack the ability to attend to the time series specific temporal correlations adequately. We propose using the inherent nature of quantum computers to simulate quantum dynamics as a technique to encode such features. We start by assuming that a given time series can be generated by a quantum process, after which we proceed to learn that quantum process using quantum machine learning. We then use the learned model to generate out-of-sample time series and show that it captures unique and complex features of the learned time series. We also study the class of time series that can be modeled using this technique. Finally, we experimentally demonstrate the proposed algorithm on an 11-qubit trapped-ion quantum machine.


A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets

arXiv.org Artificial Intelligence

A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution, modalities, and datasets affect the performance a lot. Many researchers have been striving to carry out generic real-time SLR models. This review paper facilitates a comprehensive overview of SLR and discusses the needs, challenges, and problems associated with SLR. We study related works about manual and non-manual, various modalities, and datasets. Research progress and existing state-of-the-art SLR models over the past decade have been reviewed. Finally, we find the research gap and limitations in this domain and suggest future directions. This review paper will be helpful for readers and researchers to get complete guidance about SLR and the progressive design of the state-of-the-art SLR model