AITopics

1912.00759

Country:

Europe > United Kingdom (0.14)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Trabucco, Brandon, Qu, Albert, Li, Simon, Ashokavardhanan, Ganeshkumar

Inferring the Optimal Policy using Markov Chain Monte Carlo

arXiv.org Artificial IntelligenceNov-15-2019

This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of model-free reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.

algorithm, algorithm 1, future reward, (13 more...)

1912.02714

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Gaudet, Brian, Linares, Richard, Furfaro, Roberto

Six Degree-of-Freedom Hovering using LIDAR Altimetry via Reinforcement Meta-Learning

arXiv.org Artificial IntelligenceNov-15-2019

We optimize a six degrees of freedom hovering policy using reinforcement meta-learning. The policy maps flash LIDAR measurements directly to on/off spacecraft body-frame thrust commands, allowing hovering at a fixed position and attitude in the asteroid body-fixed reference frame. Importantly, the policy does not require position and velocity estimates, and can operate in environments with unknown dynamics, and without an asteroid shape model or navigation aids. Indeed, during optimization the agent is confronted with a new randomly generated asteroid for each episode, insuring that it does not learn an asteroid's shape, texture, or environmental dynamics. This allows the deployed policy to generalize well to novel asteroid characteristics, which we demonstrate in our experiments. The hovering controller has the potential to simplify mission planning by allowing asteroid body-fixed hovering immediately upon the spacecraft's arrival to an asteroid. This in turn simplifies shape model generation and allows resource mapping via remote sensing immediately upon arrival at the target asteroid.

asteroid, optimization, spacecraft, (16 more...)

1911.08553

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Arizona > Pima County > Tucson (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Renewable (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

arXiv.org Artificial IntelligenceNov-15-2019

Towards Personalized Dialog Policies for Conversational Skill Discovery

Fazel-Zarandi, Maryam, Biswas, Sampat, Summers, Ryan, Elmalt, Ahmed, McCraw, Andy, McPhilips, Michael, Peach, John

Many businesses and consumers are extending the capabilities of voice-based services such as Amazon Alexa, Google Home, Microsoft Cortana, and Apple Siri to create custom voice experiences (also known as skills). As the number of these experiences increases, a key problem is the discovery of skills that can be used to address a user's request. In this paper, we focus on conversational skill discovery and present a conversational agent which engages in a dialog with users to help them find the skills that fulfill their needs. To this end, we start with a rule-based agent and improve it by using reinforcement learning. In this way, we enable the agent to adapt to different user attributes and conversational styles as it interacts with users. We evaluate our approach in a real production setting by deploying the agent to interact with real users, and show the effectiveness of the conversational agent in helping users find the skills that serve their request.

agent, category, dialog, (14 more...)

1911.06747

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.66)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Hayashi, Hideaki, Shibanoki, Taro, Shima, Keisuke, Kurita, Yuichi, Tsuji, Toshio

A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis

arXiv.org Machine LearningNov-14-2019

This paper proposes a probabilistic neural network developed on the basis of time-series discriminant component analysis (TSDCA) that can be used to classify high-dimensional time-series patterns. TSDCA involves the compression of high-dimensional time series into a lower-dimensional space using a set of orthogonal transformations and the calculation of posterior probabilities based on a continuous-density hidden Markov model with a Gaussian mixture model expressed in the reduced-dimensional space. The analysis can be incorporated into a neural network, which is named a time-series discriminant component network (TSDCN), so that parameters of dimensionality reduction and classification can be obtained simultaneously as network coefficients according to a backpropagation through time-based learning algorithm with the Lagrange multiplier method. The TSDCN is considered to enable high-accuracy classification of high-dimensional time-series patterns and to reduce the computation time taken for network training. The validity of the TSDCN is demonstrated for high-dimensional artificial data and EEG signals in the experiments conducted during the study.

classification accuracy, training time, tsdcn, (13 more...)

doi: 10.1109/TNNLS.2015.2400448

1911.06009

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Ramstedt, Simon, Pal, Christopher

Real-Time Reinforcement Learning

arXiv.org Machine LearningNov-14-2019

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor-Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor-Critic both in real-time and non-real-time settings. Code and videos can be found at https://github.com/rmst/rtrl.

algorithm, rt ac, rtmdp, (13 more...)

1911.04448

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Neogi, Satyajit, Dauwels, Justin

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

arXiv.org Machine LearningNov-12-2019

Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. Morency et al. (2007) introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets - UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose diffculty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ~4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ~2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models.

experiment, ldcrf, sequence, (14 more...)

1911.03667

Country:

Asia > Middle East > Jordan (0.04)
Asia > Singapore (0.04)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Lovas, Attila, Rásonyi, Miklós

Markov chains in random environment with applications in queueing theory and machine learning

arXiv.org Machine LearningNov-11-2019

We prove the existence of limiting distributions for a large class of Markov chains on a general state space in a random environment. We assume suitable versions of the standard drift and minorization conditions. In particular, the system dynamics should be contractive on the average with respect to the Lyapunov function and large enough small sets should exist with large enough minorization constants. We also establish that a law of large numbers holds for bounded functionals of the process. Applications to queuing systems and to machine learning algorithms are presented.

assumption 2, lemma 5, max 0, (15 more...)

1911.04377

Country:

Europe > Hungary > Budapest > Budapest (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Garrett, Caelan Reed, Paxton, Chris, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack, Fox, Dieter

Online Replanning in Belief Space for Partially Observable Task and Motion Problems

arXiv.org Artificial IntelligenceNov-11-2019

-- T o solve multi-step manipulation tasks in the real world, an autonomous robot must take actions to observe its environment and react to unexpected observations. This may require opening a drawer to observe its contents or moving an object out of the way to examine the space behind it. If the robot fails to detect an important object, it must update its belief about the world and compute a new plan of action. Additionally, a robot that acts noisily will never exactly arrive at a desired state. Still, it is important that the robot adjusts accordingly in order to keep making progress towards achieving the goal. In this work, we present an online planning and execution system for robots faced with these kinds of challenges. Our approach is able to efficiently solve partially observable problems both in simulation and in a real-world kitchen. Robots acting autonomously in human environments are faced with a variety of challenges. First, they must make both discrete decisions about what object to manipulate as well as continuous decisions about which motions to execute to achieve a desired interaction. Planning in these large hybrid spaces is the subject of integrated T ask and Motion Planning (T AMP) [1], [2], [3], [4], [5], [6].

green block, probability, robot, (15 more...)

1911.04577

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
(2 more...)

arXiv.org Artificial IntelligenceNov-11-2019

Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning

Palanisamy, Praveen

The capability to learn and adapt to changes in the driving environment is crucial for developing autonomous driving systems that are scalable beyond geo-fenced operational design domains. Deep Reinforcement Learning (RL) provides a promising and scalable framework for developing adaptive learning based solutions. Deep RL methods usually model the problem as a (Partially Observable) Markov Decision Process in which an agent acts in a stationary environment to learn an optimal behavior policy. However, driving involves complex interaction between multiple, intelligent (artificial or human) agents in a highly non-stationary environment. In this paper, we propose the use of Partially Observable Markov Games(POSG) for formulating the connected autonomous driving problems with realistic assumptions. We provide a taxonomy of multi-agent learning environments based on the nature of tasks, nature of agents and the nature of the environment to help in categorizing various autonomous driving problems that can be addressed under the proposed formulation. As our main contributions, we provide MACAD-Gym, a Multi-Agent Connected, Autonomous Driving agent learning platform for furthering research in this direction. Our MACAD-Gym platform provides an extensible set of Connected Autonomous Driving (CAD) simulation environments that enable the research and development of Deep RL- based integrated sensing, perception, planning and control algorithms for CAD systems with unlimited operational design domain under realistic, multi-agent settings. We also share the MACAD-Agents that were trained successfully using the MACAD-Gym platform to learn control policies for multiple vehicle agents in a partially observable, stop-sign controlled, 3-way urban intersection environment with raw (camera) sensor observations.

actor, agent, arxiv e-print, (14 more...)

1911.04175

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)