AITopics

1902.05795

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Dadashi, Robert, Taïga, Adrien Ali, Roux, Nicolas Le, Schuurmans, Dale, Bellemare, Marc G.

The Value Function Polytope in Reinforcement Learning

arXiv.org Machine LearningFeb-15-2019

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.

aaab83icbvbns8naej3ur1q qh69lbahxkoioh4lxjxwsb, latexit latexitsha1, value function, (11 more...)

1901.11524

Country:

North America > Canada > Alberta (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Santos, Thiago Freitas dos, Santos, Paulo E., Ferreira, Leonardo A., Bianchi, Reinaldo A. C., Cabalar, Pedro

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

arXiv.org Artificial IntelligenceFeb-15-2019

Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning.

machine learning, puzzle, reinforcement learning, (15 more...)

1903.03411

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Shen, Macheng, How, Jonathan P

Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-14-2019

We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1902.05644

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Hausknecht, Matthew, Loynd, Ricky, Yang, Greg, Swaminathan, Adith, Williams, Jason D.

NAIL: A General Interactive Fiction Agent

arXiv.org Artificial IntelligenceFeb-14-2019

Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parser-based IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL's performance.

decision module, machine learning, natural language, (16 more...)

1902.04259

Genre: Research Report (0.41)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Dobi, Balázs, Zempléni, András

Markov Chain-based Cost-Optimal Control Charts for Healthcare Data

arXiv.org Machine LearningFeb-14-2019

Control charts have traditionally been used in industrial statistics, but are constantly seeing new areas of application, especially in the age of Industry 4.0. This paper introduces a new method, which is suitable for applications in the healthcare sector, especially for monitoring a health-characteristic of a patient. We adapt a Markov chain-based approach and develop a method in which not only the shift size (i.e. the degradation of the patient's health) can be random, but the effect of the repair (i.e. treatment) and time between samplings (i.e. visits) too. This means that we do not use many often-present assumptions which are usually not applicable for medical treatments. The average cost of the protocol, which is determined by the time between samplings and the control limit, can be estimated using the stationary distribution of the Markov chain. Furthermore, we incorporate the standard deviation of the cost into the optimisation procedure, which is often very important from a process control viewpoint. The sensitivity of the optimal parameters and the resulting average cost and cost standard deviation on different parameter values is investigated. We demonstrate the usefulness of the approach for real-life data of patients treated in Hungary: namely the monitoring of cholesterol level of patients with cardiovascular event risk. The results showed that the optimal parameters from our approach can be somewhat different from the original medical parameters.

artificial intelligence, machine learning, standard deviation, (17 more...)

1903.06675

Country: Europe > Hungary (0.49)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.82)

Narimatsu, Hiromi, Kasai, Hiroyuki

State Duration and Interval Modeling in Hidden Semi-Markov Model for Sequential Data Analysis

arXiv.org Artificial IntelligenceFeb-13-2019

Sequential data modeling and analysis have become indispensable tools for analyzing sequential data, such as time-series data, because larger amounts of sensed event data have become available. These methods capture the sequential structure of data of interest, such as input-output relations and correlation among datasets. However, because most studies in this area are specialized or limited to their respective applications, rigorous requirement analysis of such models has not been undertaken from a general perspective. Therefore, we particularly examine the structure of sequential data, and extract the necessity of `state duration' and `state interval' of events for efficient and rich representation of sequential data. Specifically addressing the hidden semi-Markov model (HSMM) that represents such state duration inside a model, we attempt to add representational capability of a state interval of events onto HSMM. To this end, we propose two extended models: an interval state hidden semi-Markov model (IS-HSMM) to express the length of a state interval with a special state node designated as "interval state node"; and an interval length probability hidden semi-Markov model (ILP-HSMM) which represents the length of the state interval with a new probabilistic parameter "interval length probability." Exhaustive simulations have revealed superior performance of the proposed models in comparison with HSMM. These proposed models are the first reported extensions of HMM to support state interval representation as well as state duration representation.

artificial intelligence, machine learning, sequence, (18 more...)

doi: 10.1007/s10472-017-9561-y

1608.06954

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > New York (0.04)
North America > United States > Nevada (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Yang, Lin F., Wang, Mengdi

Sample-Optimal Parametric Q-Learning with Linear Transition Models

arXiv.org Machine LearningFeb-13-2019

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model. We propose a parametric Q-learning algorithm that finds an approximate-optimal policy using a sample size proportional to the feature dimension $K$ and invariant with respect to the size of the state space. To further improve its sample efficiency, we exploit the monotonicity property and intrinsic noise structure of the Bellman operator, provided the existence of anchor state-actions that imply implicit non-negativity in the feature space. We augment the algorithm using techniques of variance reduction, monotonicity preservation, and confidence bounds. It is proved to find a policy which is $\epsilon$-optimal from any initial state with high probability using $\widetilde{O}(K/\epsilon^2(1-\gamma)^3)$ sample transitions for arbitrarily large-scale MDP with a discount factor $\gamma\in(0,1)$. A matching information-theoretical lower bound is proved, confirming the sample optimality of the proposed method with respect to all parameters (up to polylog factors).

algorithm, iteration, probability, (13 more...)

1902.04779

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

#artificialintelligenceFeb-12-2019, 19:21:20 GMT

How AI could help you learn sign language

Sign languages aren't easy to learn and are even harder to teach. They use not just hand gestures but also mouthings, facial expressions and body posture to communicate meaning. This complexity means professional teaching programs are still rare and often expensive. But this could all change soon, with a little help from artificial intelligence (AI). My colleagues and I are working on software for teaching yourself sign languages in an automated, intuitive way.

artificial intelligence, machine learning, sign language, (10 more...)

#artificialintelligence

Industry: Education > Curriculum > Subject-Specific Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)

Agudelo-España, Diego, Gomez-Gonzalez, Sebastian, Bauer, Stefan, Schölkopf, Bernhard, Peters, Jan

Bayesian Online Detection and Prediction of Change Points

arXiv.org Machine LearningFeb-12-2019

Online detection of instantaneous changes in the generative process of a data sequence generally focuses on retrospective inference of such change points without considering their future occurrences. We extend the Bayesian Online Change Point Detection algorithm to also infer the number of time steps until the next change point (i.e., the residual time). This enables us to handle observation models which depend on the total segment duration, which is useful to model data sequences with temporal scaling. In addition, we extend the model by removing the i.i.d. assumption on the observation model parameters. The resulting inference algorithm for segment detection can be deployed in an online fashion, and we illustrate applications to synthetic and to two medical real-world data sets.

bayesian online detection and prediction, inference, sequence, (11 more...)

1902.04524

Country:

Europe > Spain (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)