AITopics

arXiv.org Artificial IntelligenceFeb-15-2019

Examining Adversarial Learning against Graph-based IoT Malware Detection Systems

Abusnaina, Ahmed, Khormali, Aminollah, Alasmary, Hisham, Park, Jeman, Anwar, Afsah, Meteriz, Ulku, Mohaisen, Aziz

The main goal of this study is to investigate the robustness of graph-based Deep Learning (DL) models used for Internet of Things (IoT) malware classification against Adversarial Learning (AL). We designed two approaches to craft adversarial IoT software, including Off-the-Shelf Adversarial Attack (OSAA) methods, using six different AL attack approaches, and Graph Embedding and Augmentation (GEA). The GEA approach aims to preserve the functionality and practicality of the generated adversarial sample through a careful embedding of a benign sample to a malicious one. Our evaluations demonstrate that OSAAs are able to achieve a misclassification rate (MR) of 100%. Moreover, we observed that the GEA approach is able to misclassify all IoT malware samples as benign.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1902.04416

Country: Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Silva, Andrew, Gombolay, Matthew

ProLoNets: Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning

arXiv.org Machine LearningFeb-15-2019

Deep reinforcement learning has seen great success across a breadth of tasks such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily-available human domain experts' knowledge that could help ``warm start'' the learning process. Further, learning from demonstration techniques are not yet sufficient to infer this knowledge through sampling-based mechanisms in large state and action spaces, or require immense amounts of data. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions that are amenable to gradient descent. We show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of canonical challenge problems for artificial intelligence.

agent, initialization, prolonet, (13 more...)

1902.06007

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

arXiv.org Machine LearningFeb-15-2019

Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

Wang, Yuhui, He, Hao, Tan, Xiaoyang

In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this. We addressed the issue within the framework of partially observable Markov Decision Process (POMDP) using a model-based method, in which the transition model is estimated from the incomplete and noisy observations using a newly proposed surrogate loss function with local approximation, while the policy and value function is learned with the help of belief imputation. For the latter purpose, a generative model is constructed and is seamlessly incorporated into the belief updating procedure of POMDP, which enables robust execution even under a significant incompleteness and noise. The effectiveness of the proposed method is verified on a collection of benchmark tasks, showing that our approach outperforms several compared methods under various challenging scenarios.

algorithm, belief state, timestep, (11 more...)

1902.05795

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Dadashi, Robert, Taïga, Adrien Ali, Roux, Nicolas Le, Schuurmans, Dale, Bellemare, Marc G.

The Value Function Polytope in Reinforcement Learning

arXiv.org Machine LearningFeb-15-2019

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.

aaab83icbvbns8naej3ur1q qh69lbahxkoioh4lxjxwsb, latexit latexitsha1, value function, (11 more...)

1901.11524

Country:

North America > Canada > Alberta (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Santos, Thiago Freitas dos, Santos, Paulo E., Ferreira, Leonardo A., Bianchi, Reinaldo A. C., Cabalar, Pedro

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

arXiv.org Artificial IntelligenceFeb-15-2019

Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning.

machine learning, puzzle, reinforcement learning, (15 more...)

1903.03411

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Shen, Macheng, How, Jonathan P

Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-14-2019

We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1902.05644

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Bhatt, Aditya, Argus, Max, Amiranashvili, Artemij, Brox, Thomas

CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

arXiv.org Machine LearningFeb-14-2019

Off-policy Temporal Difference (TD) learning methods, when combined with function approximators, suffer from the risk of divergence, a phenomenon known as the deadly triad. It has long been noted that some feature representations work better than others. In this paper we investigate how feature normalization can prevent divergence and improve training. Our method, which we call CrossNorm, can be regarded as a new variant of batch normalization that re-centers data for multi-modal distributions, which occur in the off-policy TD updates. We show empirically that CrossNorm improves the stability of the learning process. We apply CrossNorm to DDPG and TD3 and achieve stable training and improved performance across a range of MuJoCo benchmark tasks. Moreover, for the first time, we are able to train DDPG stably without the use of target networks.

crossnorm, normalization, target network, (12 more...)

1902.05605

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-14-2019

Unsupervised Visuomotor Control through Distributional Planning Networks

Yu, Tianhe, Shevchuk, Gleb, Sadigh, Dorsa, Finn, Chelsea

While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of the environment needed to compute progress are not directly accessible. To enable robots to autonomously learn skills, we instead consider the problem of reinforcement learning without access to rewards. We aim to learn an unsupervised embedding space under which the robot can measure progress towards a goal for itself. Our approach explicitly optimizes for a metric space under which action sequences that reach a particular state are optimal when the goal is the final state reached. This enables learning effective and control-centric representations that lead to more autonomous reinforcement learning algorithms. Our experiments on three simulated environments and two real-world manipulation problems show that our method can learn effective goal metrics from unlabeled interaction, and use the learned goal metrics for autonomous reinforcement learning.

inverse model, representation, sequence, (14 more...)

1902.05542

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Fulton, Nathan, Platzer, Andre

Verifiably Safe Off-Model Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-14-2019

The desire to use reinforcement learning in safety-critical settings has inspired a recent interest in formal methods for learning algorithms. Existing formal methods for learning and optimization primarily consider the problem of constrained learning or constrained optimization. Given a single correct model and associated safety constraint, these approaches guarantee efficient learning while provably avoiding behaviors outside the safety constraint. Acting well given an accurate environmental model is an important pre-requisite for safe learning, but is ultimately insufficient for systems that operate in complex heterogeneous environments. This paper introduces verification-preserving model updates, the first approach toward obtaining formal safety guarantees for reinforcement learning in settings where multiple environmental models must be taken into account. Through a combination of design-time model updates and runtime model falsification, we provide a first approach toward obtaining formal safety proofs for autonomous systems acting in heterogeneous environments.

logic & formal reasoning, machine learning, reinforcement learning, (19 more...)

1902.05632

Country: North America > United States (0.67)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)