AITopics

2011.00825

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(21 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Kim, Joanne Taery, Ha, Sehoon

Observation Space Matters: Benchmark and Optimization Algorithm

arXiv.org Artificial IntelligenceNov-2-2020

Recent advances in deep reinforcement learning (deep RL) enable researchers to solve challenging control problems, from simulated environments to real-world robotic tasks. However, deep RL algorithms are known to be sensitive to the problem formulation, including observation spaces, action spaces, and reward functions. There exist numerous choices for observation spaces but they are often designed solely based on prior knowledge due to the lack of established principles. In this work, we conduct benchmark experiments to verify common design choices for observation spaces, such as Cartesian transformation, binary contact flags, a short history, or global positions. Then we propose a search algorithm to find the optimal observation spaces, which examines various candidate observation spaces and removes unnecessary observation channels with a Dropout-Permutation test. We demonstrate that our algorithm significantly improves learning speed compared to manually designed observation spaces. We also analyze the proposed algorithm by evaluating different hyperparameters.

machine learning, observation space, reinforcement learning, (16 more...)

2011.00756

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
(2 more...)

#artificialintelligenceNov-1-2020, 18:25:24 GMT

An Important Guide To Unsupervised Machine Learning

We're living in an era of digital switch-over with only one constant – evolve. And that digital transformation is being introduced by high-tech solutions. Hence, it comes as no surprise that mundane business tasks are being completely taken over by tech advancements. Machines, artificial intelligence (AI), and unsupervised learning are reshaping the way businesses vie for a place under the sun. With that being said, let's have a closer look at how unsupervised machine learning is omnipresent in all industries.

algorithm, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Boggess, Kayla, Chen, Shenghui, Feng, Lu

Towards Personalized Explanation of Robotic Planning via User Feedback

Prior studies have found that providing explanations about robots' decisions and actions help to improve system transparency, increase human users' trust of robots, and enable effective human-robot collaboration. Different users have various preferences about what should be included in explanations. However, little research has been conducted for the generation of personalized explanations. In this paper, we present a system for generating personalized explanations of robotic planning via user feedback. We consider robotic planning using Markov decision processes (MDPs) and develop an algorithm to automatically generate a personalized explanation of an optimal robotic plan (i.e., an optimal MDP policy) based on the user preference regarding four elements (i.e., objective, locality, specificity, and abstraction). In addition, we design the system to interact with users via answering users' further questions about the generated explanations. Users have the option to update their preferences to view different explanations. The system is capable of detecting and resolving any preference conflict via user interaction. Our user study results show that the generated personalized explanations improve user satisfaction, while the majority of users liked the system's capabilities of question-answering, and conflict detection and resolution.

explanation, robotic plan, robotic planning, (17 more...)

2011.00524

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

Yang, Yaodong, Wang, Jun

Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. MARL is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier. The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

deep learning, machine learning, reinforcement learning, (17 more...)

2011.00583

Country:

North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine (0.92)
Energy (0.92)
Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Tidd, Brendan, Hudson, Nicolas, Cosgun, Akansel, Leitner, Jurgen

Learning When to Switch: Composing Controllers to Traverse a Sequence of Terrain Artifacts

Legged robots often use separate control policies that are highly engineered for traversing difficult terrain such as stairs, gaps, and steps, where switching between policies is only possible when the robot is in a region that is common to adjacent controllers. Deep Reinforcement Learning (DRL) is a promising alternative to hand-crafted control design, though typically requires the full set of test conditions to be known before training. DRL policies can result in complex (often unrealistic) behaviours that have few or no overlapping regions between adjacent policies, making it difficult to switch behaviours. In this work we develop multiple DRL policies with Curriculum Learning (CL), each that can traverse a single respective terrain condition, while ensuring an overlap between policies. We then train a network for each destination policy that estimates the likelihood of successfully switching from any other policy. We evaluate our switching method on a previously unseen combination of terrain artifacts and show that it performs better than heuristic methods. While our method is trained on individual terrain types, it performs comparably to a Deep Q Network trained on the full set of terrain conditions. This approach allows the development of separate policies in constrained conditions with embedded prior knowledge about each behaviour, that is scalable to any number of behaviours, and prepares DRL methods for applications in the real world

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2011.0044

Country: Oceania > Australia > Queensland (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Klissarov, Martin, Precup, Doina

Reward Propagation Using Graph Convolutional Networks

Potential-based reward shaping provides an approach for designing good reward functions, with the purpose of speeding up learning. However, automatically finding potential functions for complex environments is a difficult problem (in fact, of the same difficulty as learning a value function from scratch). We propose a new framework for learning potential functions by leveraging ideas from graph representation learning. Our approach relies on Graph Convolutional Networks which we use as a key ingredient in combination with the probabilistic inference view of reinforcement learning. More precisely, we leverage Graph Convolutional Networks to perform message passing from rewarding states. The propagated messages can then be used as potential functions for reward shaping to accelerate learning. We verify empirically that our approach can achieve considerable improvements in both small and high-dimensional control problems.

gcn, machine learning, reinforcement learning, (17 more...)

2010.02474

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Pasqualini, Luca, Parton, Maurizio

Pseudo Random Number Generation through Reinforcement Learning and Recurrent Neural Networks

arXiv.org Artificial IntelligenceOct-31-2020

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate PRNGs quality by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence and the observation at each time step is the last sequence of bits appended to such state. We use a Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time steps, by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP's states. We show that modeling a PRNG with a partially observable MDP and a LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.

artificial intelligence, machine learning, sequence, (18 more...)

2011.02909

Country:

Europe > Italy (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceOct-30-2020

An Adiabatic Theorem for Policy Tracking with TD-learning

Walton, Neil

Policy evaluation and, in particular, temporal difference (TD) learning is a key ingredient in reinforcement learning. Here the expected value of future rewards is estimated from simulations of a given policy. When a stationary policy is fixed, the simulated process is a time-homogeneous Markov chain. The convergence of the policy evaluation algorithm is analyzed using stochastic approximation techniques under asynchronous Markovian updates. There is a well-developed theory of stochastic approximation that establishes the convergence of a variety of policy evaluation schemes.

convergence, markov chain, transition matrix, (13 more...)

2010.12848

Country:

Europe > Montenegro (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Kwon, Minhae, Daptardar, Saurabh, Schrater, Paul, Pitkow, Xaq

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

arXiv.org Artificial IntelligenceOct-30-2020

A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. Why? We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model. We describe this behavior as rational but not optimal. The problem of Inverse Rational Control (IRC) aims to identify which internal model would best explain an agent's actions. Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes. Here we accommodate continuous nonlinear dynamics and continuous actions, and impute sensory observations corrupted by unknown noise that is private to the animal. We first build an optimal Bayesian agent that learns an optimal policy generalized over the entire model space of dynamics and subjective rewards using deep reinforcement learning. Crucially, this allows us to compute a likelihood over models for experimentally observable action trajectories acquired from a suboptimal agent. We then find the model parameters that maximize the likelihood using gradient ascent.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2009.12576

Country:

North America > United States > Minnesota (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)