AITopics

2007.12815

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Moerland, Thomas M., Broekens, Joost, Jonker, Catholijn M.

A Framework for Reinforcement Learning and Planning

arXiv.org Artificial IntelligenceJul-23-2020

Sequential decision making, commonly formalized as Markov Decision Process optimization, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are planning and reinforcement learning. Both research fields largely have their own research communities. However, if both research fields solve the same problem, then we should be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying framework for reinforcement learning and planning (FRAP), which identifies the underlying dimensions on which any planning or learning algorithm has to decide. At the end of the paper, we compare - in a single table - a variety of well-known planning, model-free and model-based RL algorithms along the dimensions of our framework, illustrating the validity of the framework. Altogether, FRAP provides deeper insight into the algorithmic space of planning and reinforcement learning, and also suggests new approaches to integration of both fields.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2006.15009

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
(4 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Moerland, Thomas M., Broekens, Joost, Jonker, Catholijn M.

Model-based Reinforcement Learning: A Survey

arXiv.org Artificial IntelligenceJul-23-2020

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a key challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two key sections, we also discuss the potential benefits of model-based RL, like enhanced data efficiency, targeted exploration, and improved stability. Along the survey, we also draw connections to several related RL fields, like hierarchical RL and transfer, and other research disciplines, like behavioural psychology. Altogether, the survey presents a broad conceptual overview of planning-learning combinations for MDP optimization.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2006.16712

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJul-23-2020

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Wang, Jianhong, Zhang, Yuan, Kim, Tae-Kyun, Gu, Yunjie

Designing task-oriented dialogue systems is a challenging research topic, since it needs not only to generate utterances fulfilling user requests but also to guarantee the comprehensibility. Many previous works trained end-to-end (E2E) models with supervised learning (SL), however, the bias in annotated system utterances remains as a bottleneck. Reinforcement learning (RL) deals with the problem through using non-differentiable evaluation metrics (e.g., the success rate) as rewards. Nonetheless, existing works with RL showed that the comprehensibility of generated system utterances could be corrupted when improving the performance on fulfilling user requests. In our work, we (1) propose modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO; (2) train HDNO with hierarchical reinforcement learning (HRL), as well as suggest alternating updates between dialogue policy and NLG during HRL inspired by fictitious play, to preserve the comprehensibility of generated system utterances while improving fulfilling user requests; and (3) propose using a discriminator modelled with language models as an additional reward to further improve the comprehensibility. We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA, showing a significant improvement on the total performance evaluated with automatic metrics.

machine learning, natural language, reinforcement learning, (16 more...)

2006.06814

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
(2 more...)

Wei, Chen-Yu, Jafarnia-Jahromi, Mehdi, Luo, Haipeng, Jain, Rahul

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

arXiv.org Machine LearningJul-23-2020

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal $\widetilde{O}(\sqrt{T})$ regret and another computationally efficient variant with $\widetilde{O}(T^{3/4})$ regret, where $T$ is the number of interactions. Next, taking inspiration from adversarial linear bandits, we develop yet another efficient algorithm with $\widetilde{O}(\sqrt{T})$ regret under a different set of assumptions, improving the best existing result by Hao et al. (2020) with $\widetilde{O}(T^{2/3})$ regret. Moreover, we draw a connection between this algorithm and the Natural Policy Gradient algorithm proposed by Kakade (2002), and show that our analysis improves the sample complexity bound recently given by Agarwal et al. (2020).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2007.11849

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Workflow (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Cho, Wendy K. Tam, Liu, Yan Y.

A Parallel Evolutionary Multiple-Try Metropolis Markov Chain Monte Carlo Algorithm for Sampling Spatial Partitions

arXiv.org Artificial IntelligenceJul-22-2020

We develop an Evolutionary Markov Chain Monte Carlo (EMCMC) algorithm for sampling spatial partitions that lie within a large and complex spatial state space. Our algorithm combines the advantages of evolutionary algorithms (EAs) as optimization heuristics for state space traversal and the theoretical convergence properties of Markov Chain Monte Carlo algorithms for sampling from unknown distributions. Local optimality information that is identified via a directed search by our optimization heuristic is used to adaptively update a Markov chain in a promising direction within the framework of a Multiple-Try Metropolis Markov Chain model that incorporates a generalized Metropolis-Hasting ratio. We further expand the reach of our EMCMC algorithm by harnessing the computational power afforded by massively parallel architecture through the integration of a parallel EA framework that guides Markov chains running in parallel.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2007.11461

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Illinois (0.04)
North America > United States > Ohio (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Liao, Peng, Qi, Zhengling, Murphy, Susan

Batch Policy Learning in Average Reward Markov Decision Processes

arXiv.org Machine LearningJul-22-2020

We study the problem of policy optimization in Markov Decision Process over infinite time horizons (Puterman, 1994). We focus on the batch (i.e., off-line) setting, where historical data of multiple trajectories has been previously collected using some behavior policy. Our goal is to learn a new policy with guaranteed performance when implemented in the future. In this work, we develop a data-efficient method to learn the policy that optimizes the long-term average reward in a pre-specified policy class from a training set composed of multiple trajectories. Furthermore, we establish a finite-sample regret guarantee, i.e., the difference between the average reward of the optimal policy in the class and the average reward of the estimated policy by our proposed method. This work is motivated by the development of justin-time adaptive intervention in mobile health (mHealth) applications (Nahum-Shani et al., 2017). Our method can be used to learn a treatment policy that maps the real-time collected information about the individual's status and context to a particular treatment at each of many decision times to support health behaviors.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2007.11771

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

arXiv.org Artificial IntelligenceJul-21-2020

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Zhou, Dongruo, He, Jiafan, Gu, Quanquan

Designing efficient algorithms that learn and plan in sequential decision-making tasks with large state and action spaces has become the central goal of modern reinforcement learning (RL) in recent years. Due to numerous possible states and actions, traditional tabular reinforcement learning methods (Watkins, 1989; Jaksch et al., 2010; Azar et al., 2017) which directly access each stateaction pair are computationally intractable. A common method to design reinforcement learning algorithms for large-scale state and action spaces is to make use of feature mappings such as linear functions or neural networks to map states and actions to a low-dimensional space and solve the decision-making problem in the feature space. Despite the empirical success of feature mapping based reinforcement learning methods (Singh et al., 1995; Kwok and Fox, 2004; Bertsekas, 2018), the theoretical understanding and the fundamental limits of these methods remain largely understudied. In this paper, we aim to develop provable reinforcement learning algorithms with feature mapping for discounted Markov Decision Processes (MDPs). Discounted MDP is one of the most widely used models to formulate the modern reinforcement learning tasks such as Atari games (Mnih et al., 2015) and deep recommendation system (Zheng et al., 2018).

inequality hold, machine learning, reinforcement learning, (14 more...)

2006.13165

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Henkel, Florian, Kelz, Rainer, Widmer, Gerhard

Learning to Read and Follow Music in Complete Score Sheet Images

arXiv.org Machine LearningJul-21-2020

This paper addresses the task of score following in sheet music given as unprocessed images. While existing work either relies on OMR software to obtain a computer-readable score representation, or crucially relies on prepared sheet image excerpts, we propose the first system that directly performs score following in full-page, completely unprocessed sheet images. Based on incoming audio and a given image of the score, our system directly predicts the most likely position within the page that matches the audio, outperforming current state-of-the-art image-based score followers in terms of alignment precision. We also compare our method to an OMR-based approach and empirically show that it can be a viable alternative to such a system.

artificial intelligence, machine learning, sheet image, (18 more...)

2007.10736

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(15 more...)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Bishwas, Arit Kumar, Mani, Ashish, Palade, Vasile

Parts of Speech Tagging in NLP: Runtime Optimization with Quantum Formulation and ZX Calculus

arXiv.org Artificial IntelligenceJul-19-2020

Many organizations are claiming their stacks in this space [1][2][3][4]. In today's world, the available quantum computers are at very early stages and not capable of handling complex quantum artificial intelligence/machine learning (qAI/qML) tasks [5]. But we still can harness their properties to run some of our quantum AI/ML algorithms more efficiently. In this sense, we can use the "Noisy Intermediate Scale Quantum Systems" (NISQ) [6] to serve the purpose. We can run the less complex quantum subroutines of a big qAI/qML in these kinds of quantum computers and use the results in the main qAI/qML problem-solving pipeline. This way we create a classical-quantum hybrid problem-solving ecosystem in AI/ML space.

artificial intelligence, machine learning, natural language, (17 more...)

2007.10328

Country:

North America > United States > Pennsylvania (0.05)
Asia > India (0.05)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)