AITopics

1809.08926

Country:

Asia (0.46)
North America (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Hasanbeig, Mohammadhosein, Abate, Alessandro, Kroening, Daniel

Logically-Constrained Neural Fitted Q-Iteration

arXiv.org Machine LearningSep-20-2018

This paper proposes a method for efficient training of the Q-function for continuous-state Markov Decision Processes (MDP), such that the traces of the resulting policies satisfy a Linear Temporal Logic (LTL) property. The logical property is converted into a limit deterministic Buchi automaton with which a product MDP is constructed. The control policy is then synthesized by a reinforcement learning algorithm assuming that no prior knowledge is available from the MDP. The proposed method is evaluated in a numerical study to test the quality of the generated control policy and is compared against conventional methods for policy synthesis such as MDP abstraction (Voronoi quantizer) and approximate dynamic programming (fitted value iteration).

algorithm, mdp, state space, (14 more...)

1809.07823

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Arizona (0.04)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Davis, Trevor, Waugh, Kevin, Bowling, Michael

Solving Large Extensive-Form Games with Strategy Constraints

arXiv.org Artificial IntelligenceSep-20-2018

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.

artificial intelligence, constraint, machine learning, (18 more...)

1809.07893

Country: North America (0.28)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Butz, Martin V., Bilkey, David, Humaidan, Dania, Knott, Alistair, Otte, Sebastian

Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture

arXiv.org Artificial IntelligenceSep-19-2018

We introduce a dynamic artificial neural network-based (ANN) adaptive inference process, which learns temporal predictive models of dynamical systems. We term the process REPRISE, a REtrospective and PRospective Inference SchEme. REPRISE infers the unobservable contextual state that best explains its recently encountered sensorimotor experiences as well as accompanying, context-dependent temporal predictive models retrospectively. Meanwhile, it executes prospective inference, optimizing upcoming motor activities in a goal-directed manner. In a first implementation, a recurrent neural network (RNN) is trained to learn a temporal forward model, which predicts the sensorimotor contingencies of different simulated dynamic vehicles. The RNN is augmented with contextual neurons, which enable the compact encoding of distinct, but related sensorimotor dynamics. We show that REPRISE is able to concurrently learn to separate and approximate the encountered sensorimotor dynamics. Moreover, we show that REPRISE can exploit the learned model to induce goal-directed, model-predictive control, that is, approximate active inference: Given a goal state, the system imagines a motor command sequence optimizing it with the prospective objective to minimize the distance to a given goal. Meanwhile, the system evaluates the encountered sensorimotor contingencies retrospectively, adapting its neural hidden states for maintaining model coherence. The RNN activities thus continuously imagine the upcoming future and reflect on the recent past, optimizing both, hidden state and motor activities. In conclusion, the combination of temporal predictive structures with modulatory, generative encodings offers a way to develop compact event codes, which selectively activate particular types of sensorimotor event-specific dynamics.

deep learning, upstream oil & gas, vehicle, (23 more...)

1809.07412

Country:

Europe > Germany (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.15)
Oceania > New Zealand (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Leisure & Entertainment > Games (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Dornheim, Johannes, Link, Norbert, Gumbsch, Peter

Model-Free Adaptive Optimal Control of Sequential Manufacturing Processes using Reinforcement Learning

arXiv.org Artificial IntelligenceSep-18-2018

A self-learning optimal control algorithm for sequential manufacturing processes with time-discrete control actions is proposed and evaluated with simulated deep drawing processes. The necessary control model is built during consecutive process executions under optimal control via Reinforcement Learning, using the measured product quality as reward after each process execution. Prior model formation, which is required by state-of-the-art algorithms like Model Predictive Control and Approximate Dynamic Programming, is therefore obsolete. This avoids the difficulties in system identification and accurate modelling, which arise with processes subject to non-linear dynamics and stochastic influences. Also runtime complexity problems of these approaches are avoided, which arise when more complex models and larger control prediction horizons are employed. Instead of using pre-created process- and observation-models, Reinforcement Learning algorithms build functions of expected future reward during processing, which are then used for optimal process control decisions. The learning of such expectation functions is realized online by interacting with the process. The proposed algorithm also takes stochastic variations of the process conditions into consideration and is able to cope with partial observability. A method for the adaptive optimal control of partially observable fixed-horizon manufacturing processes, based on Q-learning is developed and studied. The resulting algorithm is instantiated and then evaluated by application to a time-stochastic optimal control problem in metal sheet deep drawing, where the experiments use FEM-simulated processes. The Reinforcement Learning based control shows superior results over the model-based Model Predictive Control and Approximate Dynamic Programming approaches.

deep learning, optimal control, upstream oil & gas, (18 more...)

1809.06646

Country: Europe > Germany (0.28)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Vasishta, Pavan, Vaufreydaz, Dominique, Spalanzani, Anne

Building Prior Knowledge: A Markov Based Pedestrian Prediction Model Using Urban Environmental Data

arXiv.org Machine LearningSep-17-2018

Autonomous Vehicles navigating in urban areas have a need to understand and predict future pedestrian behavior for safer navigation. This high level of situational awareness requires observing pedestrian behavior and extrapolating their positions to know future positions. While some work has been done in this field using Hidden Markov Models (HMMs), one of the few observed drawbacks of the method is the need for informed priors for learning behavior. In this work, an extension to the Growing Hidden Markov Model (GHMM) method is proposed to solve some of these drawbacks. This is achieved by building on existing work using potential cost maps and the principle of Natural Vision. As a consequence, the proposed model is able to predict pedestrian positions more precisely over a longer horizon compared to the state of the art. The method is tested over "legal" and "illegal" behavior of pedestrians, having trained the model with sparse observations and partial trajectories. The method, with no training data, is compared against a trained state of the art model. It is observed that the proposed method is robust even in new, previously unseen areas.

artificial intelligence, machine learning, trajectory, (15 more...)

1809.06045

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
Europe > Switzerland (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation (0.69)
Government > Military (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Markovic, Romana, Frisch, Jérôme, van Treeck, Christoph

Learning short-term past as predictor of human behavior in commercial buildings

arXiv.org Machine LearningSep-17-2018

This paper addresses the question of identifying the time-window in short-term past from which the information regarding the future occupant's window opening actions and resulting window states in buildings can be predicted. The addressed sequence duration was in the range between 30 and 240 time-steps of indoor climate data, where the applied temporal discretization was one minute. For that purpose, a deep neural network is trained to predict the window states, where the input sequence duration is handled as an additional hyperparameter. Eventually, the relationship between the prediction accuracy and the time-lag of the predicted window state in future is analyzed. The results pointed out, that the optimal predictive performance was achieved for the case where 60 time-steps of the indoor climate data were used as input. Additionally, the results showed that very long sequences (120-240 time-steps) could be addressed efficiently, given the right hyperprameters. Hence, the use of the memory over previous hours of high-resolution indoor climate data did not improve the predictive performance, when compared to the case where 30/60 minutes indoor sequences were used. The analysis of the prediction accuracy in the form of F1 score for the different time-lag of future window states dropped from 0.51 to 0.27, when shifting the prediction target from 10 to 60 minutes in future.

artificial intelligence, machine learning, neural network, (19 more...)

1809.1002

Country:

Europe > Germany (0.28)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Energy (1.00)
Banking & Finance > Real Estate (0.50)
Construction & Engineering > HVAC (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Kurt, Mehmet Necip, Ogundijo, Oyetunji, Li, Chong, Wang, Xiaodong

Online Cyber-Attack Detection in Smart Grid: A Reinforcement Learning Approach

arXiv.org Machine LearningSep-14-2018

Early detection of cyber-attacks is crucial for a safe and reliable operation of the smart grid. In the literature, outlier detection schemes making sample-by-sample decisions and online detection schemes requiring perfect attack models have been proposed. In this paper, we formulate the online attack/anomaly detection problem as a partially observable Markov decision process (POMDP) problem and propose a universal robust online detection algorithm using the framework of model-free reinforcement learning (RL) for POMDPs. Numerical studies illustrate the effectiveness of the proposed RL-based algorithm in timely and accurate detection of cyber-attacks targeting the smart grid. A. Background and Related W ork The next generation power grid, i.e., the smart grid, relies on advanced control and communication technologies. This critical cyber infrastructure makes the smart grid vulnerable to hostile cyber-attacks [1]-[3]. Main objective of attackers is to damage/mislead the state estimation mechanism in the smart grid to cause wide-area power blackouts or to manipulate electricity market prices [4]. There are many types of cyber-attacks, among them false data injection (FDI), jamming, and denial of service (DoS) attacks are well known. FDI attacks add malicious fake data to meter measurements [5]-[8], jamming attacks corrupt meter measurements via additive noise [9], and DoS attacks block the access of system to meter measurements [8], [10], [11]. The smart grid is a complex network and any failure or anomaly in a part of the system may lead to huge damages on the overall system in a short period of time. Hence, early detection of cyber-attacks is critical for a timely and effective response. In this context, the framework of quickest change detection [12]-[15] is quite useful. In the quickest change detection problems, a change occurs in the sensing environment at an unknown time and the aim is to detect the change as soon as possible with the minimal level of false alarms based on the measurements that become available sequentially over time. After obtaining measurements at a given time, decision maker either declares a change or waits for the next time interval to have further measurements.

data mining, machine learning, reinforcement learning, (17 more...)

1809.05258

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Arnekvist, Isac, Kragic, Danica, Stork, Johannes A.

VPE: Variational Policy Embedding for Transfer Reinforcement Learning

arXiv.org Machine LearningSep-14-2018

Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffers from a reality-gap when applied in real-world settings. This raises the need of efficient adaptation of policies acting in new environments. We consider this as a problem of transferring knowledge within a family of similar Markov decision processes. For this purpose we assume that Q-functions are generated by some low-dimensional latent variable. Given such a Q-function, we can find a master policy that can adapt given different values of this latent variable. Our method learns both the generative mapping and an approximate posterior of the latent variables, enabling identification of policies for new tasks by searching only in the latent space, rather than the space of all policies. The low-dimensional space, and master policy found by our method enables policies to quickly adapt to new environments. We demonstrate the method on both a pendulum swing-up task in simulation, and for simulation-to-real transfer on a pushing task.

dimension, machine learning, reinforcement learning, (18 more...)

1809.03548

Country:

Europe > Sweden (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceSep-14-2018

On Plans With Loops and Noise

Belle, Vaishak

In an influential paper, Levesque proposed a formal specification for analysing the correctness of program-like plans, such as conditional plans, iterative plans, and knowledge-based plans. He motivated a logical characterisation within the situation calculus that included binary sensing actions. While the characterisation does not immediately yield a practical algorithm, the specification serves as a general skeleton to explore the synthesis of program-like plans for reasonable, tractable fragments. Increasingly, classical plan structures are being applied to stochastic environments such as robotics applications. This raises the question as to what the specification for correctness should look like, since Levesque's account makes the assumption that sensing is exact and actions are deterministic. Building on a situation calculus theory for reasoning about degrees of belief and noise, we revisit the execution semantics of generalised plans. The specification is then used to analyse the correctness of example plans.

levesque, logic & formal reasoning, machine learning, (21 more...)

1809.05309

Country:

North America > United States (0.68)
Europe (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)