AITopics

2008.08221

Country:

Asia (0.28)
North America > United States > Wisconsin (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Transportation (1.00)
Information Technology (1.00)
Health & Medicine > Consumer Health (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceAug-18-2020

ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

Xia, Fei, Li, Chengshu, Martín-Martín, Roberto, Litany, Or, Toshev, Alexander, Savarese, Silvio

Many Reinforcement Learning (RL) approaches use joint control signals (positions, velocities, torques) as action space for continuous control tasks. We propose to lift the action space to a higher level in the form of subgoals for a motion generator (a combination of motion planner and trajectory executor). We argue that, by lifting the action space and by leveraging sampling-based motion planners, we can efficiently use RL to solve complex, long-horizon tasks that could not be solved with existing RL methods in the original action space. We propose ReLMoGen -- a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base. These problems are challenging because they are usually long-horizon, hard to explore during training, and comprise alternating phases of navigation and interaction. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. In all settings, ReLMoGen outperforms state-of-the-art Reinforcement Learning and Hierarchical Reinforcement Learning baselines. ReLMoGen also shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2008.07792

Country:

North America > United States > Oregon (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceAug-17-2020

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

Mou, Wenlong, Wen, Zheng, Chen, Xi

We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which are intractably large in many practical problems. To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP). Using a simulator oracle, we prove a near-optimal sample complexity upper bound that only depends linearly on the eluder dimension. We further prove a similar regret bound in deterministic systems without the simulator.

dimension, machine learning, reinforcement learning, (19 more...)

2008.07353

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Chen, Xiaoyi, Chaudhari, Pratik

MIDAS: Multi-agent Interaction-aware Decision-making with Adaptive Strategies for Urban Autonomous Navigation

arXiv.org Machine LearningAug-17-2020

Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonable, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper builds a reinforcement learning-based method named MIDAS where an ego-agent learns to affect the control actions of other cars in urban driving scenarios. MIDAS uses an attention-mechanism to handle an arbitrary number of other agents and includes a ''driver-type'' parameter to learn a single policy that works across different planning objectives. We build a simulation environment that enables diverse interaction experiments with a large number of agents and methods for quantitatively studying the safety, efficiency, and interaction among vehicles. MIDAS is validated using extensive experiments and we show that it (i) can work across different road geometries, (ii) results in an adaptive ego policy that can be tuned easily to satisfy performance criteria such as aggressive or cautious driving, (iii) is robust to changes in the driving policies of external agents, and (iv) is more efficient and safer than existing approaches to interaction-aware decision-making.

agent, artificial intelligence, machine learning, (17 more...)

2008.07081

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceAug-16-2020

Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification

Mei, Hongyuan, Qin, Guanghui, Xu, Minjie, Eisner, Jason

Learning how to predict future events from patterns of past events is difficult when the set of possible event types is large. Training an unrestricted neural model might overfit to spurious patterns. To exploit domain-specific knowledge of how past events might affect an event's present probability, we propose using a temporal deductive database to track structured facts over time. Rules serve to prove facts from other facts and from past events. Each fact has a time-varying state---a vector computed by a neural net whose topology is determined by the fact's provenance, including its experience of past events. The possible event types at any time are given by special facts, whose probabilities are neurally modeled alongside their states. In both synthetic and real-world domains, we show that neural probabilistic models derived from concise Datalog programs improve prediction by encoding appropriate domain knowledge in their architecture.

artificial intelligence, machine learning, natural language, (20 more...)

2006.16723

Country:

North America > United States > Maryland (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

Wachi, Akifumi, Sui, Yanan

Safe Reinforcement Learning in Constrained Markov Decision Processes

arXiv.org Artificial IntelligenceAug-14-2020

Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints. Specifically, we take a stepwise approach for optimizing safety and cumulative reward. In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region. We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward under proper regularity assumptions. In our experiments, we demonstrate the effectiveness of SNO-MDP through two experiments: one uses a synthetic data in a new, openly-available environment named GP-SAFETY-GYM, and the other simulates Mars surface exploration by using real observation data.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2008.06626

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Prabhu, Shreekanth M., Subramaniam, Natarajan

Surveillance of COVID-19 Pandemic using Hidden Markov Model

arXiv.org Machine LearningAug-14-2020

COVID-19 pandemic has brought the whole world to a stand-still over the last few months. In particular the pace at which pandemic has spread has taken everybody off-guard. The Governments across the world have responded by imposing lock-downs, stopping/restricting travel and mandating social distancing. On the positive side there is wide availability of information on active cases, recoveries and deaths collected daily across regions. However, what has been particularly challenging is to track the spread of the disease by asymptomatic carriers termed as super-spreaders. In this paper we look at applying Hidden Markov Model to get a better assessment of extent of spread. The outcome of such analysis can be useful to Governments to design the required interventions/responses in a calibrated manner. The data we have chosen to analyze pertains to Indian scenario.

artificial intelligence, hidden markov model, machine learning, (16 more...)

2008.07609

Country:

Asia > India > West Bengal (0.05)
Asia > India > Haryana (0.05)
Asia > India > Gujarat (0.05)
(23 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningAug-14-2020

The Projected Belief Network Classfier : both Generative and Discriminative

Baggenstoss, Paul M

The projected belief network (PBN) is a layered generative network with tractable likelihood function, and is based on a feed-forward neural network (FF-NN). It can therefore share an embodiment with a discriminative classifier and can inherit the best qualities of both types of network. In this paper, a convolutional PBN is constructed that is both fully discriminative and fully generative and is tested on spectrograms of spoken commands. It is shown that the network displays excellent qualities from either the discriminative or generative viewpoint. Random data synthesis and visible data reconstruction from low-dimensional hidden variables are shown, while classifier performance approaches that of a regularized discriminative network. Combination with a conventional discriminative CNN is also demonstrated.

artificial intelligence, classifier, machine learning, (18 more...)

2008.06434

Country:

North America > United States > California (0.04)
Europe > Spain > Galicia > A Coruña Province > A Coruña (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Geng, Sinong, Nassif, Houssam, Manzanares, Carlos A., Reppen, A. Max, Sircar, Ronnie

Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions

arXiv.org Machine LearningAug-14-2020

We propose a reward function estimation framework for inverse reinforcement learning with deep energy-based policies. We name our method PQR, as it sequentially estimates the Policy, the $Q$-function, and the Reward function by deep learning. PQR does not assume that the reward solely depends on the state, instead it allows for a dependency on the choice of action. Moreover, PQR allows for stochastic state transitions. To accomplish this, we assume the existence of one anchor action whose reward is known, typically the action of doing nothing, yielding no reward. We present both estimators and algorithms for the PQR method. When the environment transition is known, we prove that the PQR reward estimator uniquely recovers the true reward. With unknown transitions, we bound the estimation error of PQR. Finally, the performance of PQR is demonstrated by synthetic and real-world datasets.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2007.07443

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(14 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Zhang, Zhili, Zhu, Quanyan

Deceptive Kernel Function on Observations of Discrete POMDP

arXiv.org Artificial IntelligenceAug-12-2020

This paper studies the deception applied on agent in a partially observable Markov decision process. We introduce deceptive kernel function (the kernel) applied to agent's observations in a discrete POMDP. Based on value iteration, value function approximation and POMCP three characteristic algorithms used by agent, we analyze its belief being misled by falsified observations as the kernel's outputs and anticipate its probable threat on agent's reward and potentially other performance. We validate our expectation and explore more detrimental effects of the deception by experimenting on two POMDP problems. The result shows that the kernel applied on agent's observation can affect its belief and substantially lower its resulting rewards; meantime certain implementation of the kernel could induce other abnormal behaviors by the agent.

agent, artificial intelligence, machine learning, (17 more...)

2008.05585

Country:

North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)