AITopics

2004.00857

Country:

Europe > Austria > Tyrol > Innsbruck (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Artificial IntelligenceApr-2-2020

Continuous Motion Planning with Temporal Logic Specifications using Deep Neural Networks

Wang, Chuanzheng, Li, Yinan, Smith, Stephen L., Liu, Jun

In this paper, we propose a model-free reinforcement learning method to synthesize control policies for motion planning problems for continuous states and actions. The robot is modelled as a labeled Markov decision process (MDP) with continuous state and action spaces. Linear temporal logics (LTL) are used to specify high-level tasks. We then train deep neural networks to approximate the value function and policy using an actor-critic reinforcement learning method. The LTL specification is converted into an annotated limit-deterministic B\"uchi automaton (LDBA) for continuously shaping the reward so that dense reward is available during training. A naive way of solving a motion planning problem with LTL specifications using reinforcement learning is to sample a trajectory and, if the trajectory satisfies the entire LTL formula then we assign a high reward for training. However, the sampling complexity needed to find such a trajectory is too high when we have a complex LTL formula for continuous state and action spaces. As a result, it is very unlikely that we get enough reward for training if all sample trajectories start from the initial state in the automata. In this paper, we propose a method that samples not only an initial state from the state space, but also an arbitrary state in the automata at the beginning of each training episode. We test our algorithm in simulation using a car-like robot and find out that our method can learn policies for different working configurations and LTL specifications successfully.

ltl specification, specification, transition, (15 more...)

2004.0261

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

París, Iago, Sánchez-Cauce, Raquel, Díez, Francisco Javier

Sum-product networks: A survey

arXiv.org Artificial IntelligenceApr-2-2020

A sum-product network (SPN) is a probabilistic model, based on a rooted acyclic directed graph, in which terminal nodes represent univariate probability distributions and non-terminal nodes represent convex combinations (weighted sums) and products of probability functions. They are closely related to probabilistic graphical models, in particular to Bayesian networks with multiple context-specific independencies. Their main advantage is the possibility of building tractable models from data, i.e., models that can perform several inference tasks in time proportional to the number of links in the graph. They are somewhat similar to neural networks and can address the same kinds of problems, such as image processing and natural language understanding. This paper offers a survey of SPNs, including their definition, the main algorithms for inference and learning from data, the main applications, a brief review of software libraries, and a comparison with related models

algorithm, node, spn, (15 more...)

2004.01167

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

#artificialintelligenceApr-1-2020, 08:41:01 GMT

MIT CSAIL's CommPlan AI helps robots efficiently collaborate with humans

In a new study, researchers at MIT's Computer Science and Artificial Intelligence Lab propose a framework called CommPlan, which gives robots that work alongside humans principles for "good etiquette" and leave it to the robots to make decisions that let them finish tasks efficiently. They claim it's a superior approach to handcrafted rules, because it enables the robots to perform cost-benefit analyses on their decisions rather than follow task- and context-specific policies. CommPlan weighs a combination of factors, including whether a person is busy or likely to respond given past behavior, leveraging a dedicated module -- the Agent Markov Model -- to represent that person's sequential decision-making behaviors. It consists of a model specification process and an execution-time partially observable Markov decision process (POMDP) planner, derived as the robot's decision-making model, which CommPlan uses in tandem to arrive at the robot's actions and communications policies. Using CommPlan, developers first specify five modules -- a task model, communication capability, a communication cost model, a human response model, and a human action-selectable model -- with data, domain expertise, and learning algorithms.

ai help robot efficiently collaborate, commplan, decision-making model, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningApr-1-2020

Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics

Zheng, Wenjie

Traditional methods to infer compartmental epidemic models with time-varying dynamics can only capture continuous changes in the dynamic. However, many changes are discontinuous due to sudden interventions, such as city lockdown and opening of field hospitals. To model the discontinuities, this study introduces the tool of total variation regularization, which regulates the temporal changes of the dynamic parameters, such as the transmission rate. To recover the ground truth dynamic, this study designs a novel yet straightforward optimization algorithm, dubbed iterative Nelder-Mead, which repeatedly applies the Nelder-Mead algorithm. Experiments on the simulated data show that the proposed approach can qualitatively reproduce the discontinuities of the underlying dynamics. To extend this research to real data as well as to help researchers worldwide to fight against COVID-19, the author releases his research platform as an open-source package.

compartment, regularization, variation regularization, (17 more...)

2004.00412

Country:

Asia > China > Hubei Province > Wuhan (0.05)
Europe > Italy (0.04)
Europe > Germany (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningApr-1-2020

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models

Luo, Yucen, Beatson, Alex, Norouzi, Mohammad, Zhu, Jun, Duvenaud, David, Adams, Ryan P., Chen, Ricky T. Q.

Standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions.

estimator, international conference, sumo, (15 more...)

2004.00353

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Santa Monica (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningApr-1-2020

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

Wang, Yu, Roohi, Nima, West, Matthew, Viswanathan, Mahesh, Dullerud, Geir E.

Probabilistic Computation Tree Logic (PCTL) is frequently used to formally specify control objectives such as probabilistic reachability and safety. In this work, we focus on model checking PCTL specifications statistically on Markov Decision Processes (MDPs) by sampling, e.g., checking whether there exists a feasible policy such that the probability of reaching certain goal states is greater than a threshold. We use reinforcement learning to search for such a feasible policy for PCTL specifications, and then develop a statistical model checking (SMC) method with provable guarantees on its error. Specifically, we first use upper-confidence-bound (UCB) based Q-learning to design an SMC algorithm for bounded-time PCTL specifications, and then extend this algorithm to unbounded-time specifications by identifying a proper truncation time by checking the PCTL specification and its negation at the same time. Finally, we evaluate the proposed method on case studies.

algorithm, probability, specification, (12 more...)

2004.00273

Country:

North America > United States > Illinois (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Su, Jianyu, Adams, Stephen, Beling, Peter A.

Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication

arXiv.org Artificial IntelligenceApr-1-2020

We consider a fully cooperative multi-agent system where agents cooperate to maximize a system's utility in a partial-observable environment. We propose that multi-agent systems must have the ability to (1) communicate and understand the inter-plays between agents and (2) correctly distribute rewards based on an individual agent's contribution. In contrast, most work in this setting considers only one of the above abilities. In this study, we develop an architecture that allows for communication among agents and tailors the system's reward for each individual agent. Our architecture represents agent communication through graph convolution and applies an existing credit assignment structure, counterfactual multi-agent policy gradient (COMA), to assist agents to learn communication by back-propagation. The flexibility of the graph structure enables our method to be applicable to a variety of multi-agent systems, e.g. dynamic systems that consist of varying numbers of agents and static systems with a fixed number of agents. We evaluate our method on a range of tasks, demonstrating the advantage of marrying communication with credit assignment. In the experiments, our proposed method yields better performance than the state-of-art methods, including COMA. Moreover, we show that the communication strategies offers us insights and interpretability of the system's cooperative policies.

agent, communication, information, (15 more...)

2004.0047

Country: North America > United States > Virginia > Albemarle County > Charlottesville (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Abrantes, João P., Abrantes, Arnaldo J., Oliehoek, Frans A.

Mimicking Evolution with Reinforcement Learning

arXiv.org Artificial IntelligenceMar-31-2020

Evolution gave rise to human and animal intelligence here on Earth. We argue that the path to developing artificial human-like-intelligence will pass through mimicking the evolutionary process in a nature-like simulation. In Nature, there are two processes driving the development of the brain: evolution and learning. Evolution acts slowly, across generations, and amongst other things, it defines what agents learn by changing their internal reward function. Learning acts fast, across one's lifetime, and it quickly updates agents' policy to maximise pleasure and minimise pain. The reward function is slowly aligned with the fitness function by evolution, however, as agents evolve the environment and its fitness function also change, increasing the misalignment between reward and fitness. It is extremely computationally expensive to replicate these two processes in simulation. This work proposes Evolution via Evolutionary Reward (EvER) that allows learning to single-handedly drive the search for policies with increasingly evolutionary fitness by ensuring the alignment of the reward function with the fitness function. In this search, EvER makes use of the whole state-action trajectories that agents go through their lifetime. In contrast, current evolutionary algorithms discard this information and consequently limit their potential efficiency at tackling sequential decision problems. We test our algorithm in two simple bio-inspired environments and show its superiority at generating more capable agents at surviving and reproducing their genes when compared with a state-of-the-art evolutionary algorithm.

agent, algorithm, reward function, (11 more...)

2004.00048

Country:

Europe > United Kingdom > England > Essex > Colchester (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.47)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningMar-30-2020

Inference with Aggregate Data: An Optimal Transport Approach

Singh, Rahul, Haasler, Isabel, Zhang, Qinsheng, Karlsson, Johan, Chen, Yongxin

We consider inference problems over probabilistic graphical models with aggregate data. In particular, we propose a new efficient belief propagation type algorithm over tree-structured graphs with polynomial computational complexity as well as a global convergence guarantee. This is in contrast to previous methods that either exhibit prohibitive complexity as the population grows or do not guarantee convergence. Our method is based on optimal transport, or more specifically, multi-marginal optimal transport theory. In particular, the inference problem with aggregate observations we consider in this paper can be seen as a structured multi-marginal optimal transport problem, where the cost function decomposes according to the underlying graph. Consequently, the celebrated Sinkhorn algorithm for multi-marginal optimal transport can be leveraged, together with the standard belief propagation algorithm to establish an efficient inference scheme. We demonstrate the performance of our algorithm on applications such as inferring population flow from aggregate observations.

algorithm, graphical model, inference, (14 more...)

2003.13933

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)