AITopics

1904.03876

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry: Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Zolna, Konrad, Rostamzadeh, Negar, Bengio, Yoshua, Ahn, Sungjin, Pinheiro, Pedro O.

Reinforced Imitation in Heterogeneous Action Space

arXiv.org Artificial IntelligenceApr-6-2019

Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse. In this paper, we consider a challenging setting where an agent and an expert use different actions from each other. We assume that the agent has access to a sparse reward function and state-only expert observations. We propose a method which gradually balances between the imitation learning cost and the reinforcement learning objective. In addition, this method adapts the agent's policy based on either mimicking expert behavior or maximizing sparse reward. We show, through navigation scenarios, that (i) an agent is able to efficiently leverage sparse rewards to outperform standard state-only imitation learning, (ii) it can learn a policy even when its actions are different from the expert, and (iii) the performance of the agent is not bounded by that of the expert, due to the optimized usage of sparse rewards.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1904.03438

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Poland (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Manchin, Anthony, Abbasnejad, Ehsan, Hengel, Anton van den

Reinforcement Learning with Attention that Works: A Self-Supervised Approach

arXiv.org Machine LearningApr-6-2019

Attention models have had a significant positive impact on deep learning across a range of tasks. However previous attempts at integrating attention with reinforcement learning have failed to produce significant improvements. We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in the Arcade Learning Environment. Unlike the selective attention models used in previous attempts, which constrain the attention via preconceived notions of importance, our implementation utilises the Markovian properties inherent in the state input. Our method produces a faithful visualisation of the policy, focusing on the behaviour of the agent. Our experiments demonstrate that the trained policies use multiple simultaneous foci of attention, and are able to modulate attention over time to deal with situations of partial observability.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1904.03367

Country: North America > United States (0.14)

Genre: Research Report (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Liu, Yunlong, Zheng, Jianyang

Combining Offline Models and Online Monte-Carlo Tree Search for Planning from Scratch

arXiv.org Artificial IntelligenceApr-5-2019

Planning in stochastic and partially observable environments is a central issue in artificial intelligence. One commonly used technique for solving such a problem is by constructing an accurate model firstly. Although some recent approaches have been proposed for learning optimal behaviour under model uncertainty, prior knowledge about the environment is still needed to guarantee the performance of the proposed algorithms. With the benefits of the Predictive State Representations (PSRs) approach for state representation and model prediction, in this paper, we introduce an approach for planning from scratch, where an offline PSR model is firstly learned and then combined with online Monte-Carlo tree search for planning with model uncertainty. By comparing with the state-of-the-art approach of planning with model uncertainty, we demonstrated the effectiveness of the proposed approaches along with the proof of their convergence. The effectiveness and scalability of our proposed approach are also tested on the RockSample problem, which are infeasible for the state-of-the-art BA-POMDP based approaches.

artificial intelligence, machine learning, psr model, (17 more...)

1904.03008

Country:

Asia > China > Fujian Province > Xiamen (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Magner, Abram, Szpankowski, Wojciech

Goodness of Fit Testing for Dynamic Networks

arXiv.org Machine LearningApr-5-2019

Numerous networks in the real world change over time, in the sense that nodes and edges enter and leave the networks. Various dynamic random graph models have been proposed to explain the macroscopic properties of these systems and to provide a foundation for statistical inferences and predictions. It is of interest to have a rigorous way to determine how well these models match observed networks. We thus ask the following goodness of fit question: given a sequence of observations/snapshots of a growing random graph, along with a candidate model $M$, can we determine whether the snapshots came from $M$ or from some arbitrary alternative model that is well-separated from $M$ in some natural metric? We formulate this problem precisely and boil it down to goodness of fit testing for graph-valued, infinite-state Markov processes and exhibit and analyze a test based on a procedure that we call non-stationary sampling for a natural class of models.

artificial intelligence, machine learning, probability, (16 more...)

1904.03348

Country: North America > United States (0.94)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

arXiv.org Machine LearningApr-5-2019

Diversified Hidden Markov Models for Sequential Labeling

Qiao, Maoying, Bian, Wei, Xu, Richard Yida, Tao, Dacheng

Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.

artificial intelligence, machine learning, transition matrix, (15 more...)

doi: 10.1109/TKDE.2015.2433262

1904.0317

Country:

Asia > China (0.46)
Oceania > Australia (0.28)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Fugate, Sunny (Space and Naval Warfare Systems Center Pacific) | Ferguson-Walter, Kimberly (US Department of Defense)

Artificial Intelligence and Game Theory Models for Defending Critical Networks with Cyber Deception

AI MagazineApr-4-2019

Traditional cyber security techniques have led to an asymmetric disadvantage for defenders. The defender must detect all possible threats at all times from all attackers and defend all systems against all possible exploitation. In contrast, an attacker needs only to find a single path to the defender’s critical information. In this article, we discuss how this asymmetry can be rebalanced using cyber deception to change the attacker’s perception of the network environment, and lead attackers to false beliefs about which systems contain critical information or are critical to a defender’s computing infrastructure. We introduce game theory concepts and models to represent and reason over the use of cyber deception by the defender and the effect it has on attacker perception. Finally, we discuss techniques for combining artificial intelligence algorithms with game theory models to estimate hidden states of the attacker using feedback through payoffs to learn how best to defend the system using cyber deception. It is our opinion that adaptive cyber deception is a necessary component of future information systems and networks. The techniques we present can simultaneously decrease the risks and impacts suffered by defenders and dramatically increase the costs and risks of detection for attackers. Such techniques are likely to play a pivotal role in defending national and international security concerns.

defender, machine learning, reinforcement learning, (17 more...)

AI Magazine

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Huang, Xin, Hong, Sungkweon, Hofmann, Andreas, Williams, Brian C.

Online Risk-Bounded Motion Planning for Autonomous Vehicles in Dynamic Environments

arXiv.org Artificial IntelligenceApr-4-2019

A crucial challenge to efficient and robust motion planning for autonomous vehicles is understanding the intentions of the surrounding agents. Ignoring the intentions of the other agents in dynamic environments can lead to risky or over-conservative plans. In this work, we model the motion planning problem as a partially observable Markov decision process (POMDP) and propose an online system that combines an intent recognition algorithm and a POMDP solver to generate risk-bounded plans for the ego vehicle navigating with a number of dynamic agent vehicles. The intent recognition algorithm predicts the probabilistic hybrid motion states of each agent vehicle over a finite horizon using Bayesian filtering and a library of pre-learned maneuver motion models. We update the POMDP model with the intent recognition results in real time and solve it using a heuristic search algorithm which produces policies with upper-bound guarantees on the probability of near colliding with other dynamic agents. We demonstrate that our system is able to generate better motion plans in terms of efficiency and safety in a number of challenging environments including unprotected intersection left turns and lane changes as compared to the baseline methods.

artificial intelligence, machine learning, vehicle, (18 more...)

1904.02341

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (0.94)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

#artificialintelligenceApr-3-2019, 03:44:05 GMT

Reinforcement Learning Demystified: Markov Decision Processes (Part 1)

In the previous blog post we talked about reinforcement learning and its characteristics. We mentioned the process of the agent observing the environment output consisting of a reward and the next state, and then acting upon that. This whole process is a Markov Decision Process or an MDP for short. This blog post is a bit mathy. Grab your coffee and a comfortable chair, and just dive in.

artificial intelligence, machine learning, reinforcement learning demystified, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Bıyık, Erdem, Margoliash, Jonathan, Alimo, Shahrouz Ryan, Sadigh, Dorsa

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

arXiv.org Artificial IntelligenceApr-1-2019

Process (MDP) using Gaussian processes. In their work, they assumed the transition model is known and that there exists I. INTRODUCTION a predefined safety function. Both of these assumptions can Guaranteeing safety is a vital issue for many modern be quite restrictive when the system is going to operate in robotics systems, such as unmanned aerial vehicles (UAVs), unknown environments. In our work, we plan to address autonomous cars, or domestic robots [1], [2], [3]. One both of these challenges by considering unknown transition approach is to attempt to specify all potential scenarios models, and no access to a predefined safety function.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1904.01068

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.64)

Industry:

Transportation (0.54)
Information Technology > Robotics & Automation (0.54)
Government (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)