AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Improper Learning for Non-Stochastic Control

Simchowitz, Max, Singh, Karan, Hazan, Elad

arXiv.org Machine LearningJan-24-2020

We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies. In the fully-adversarial setting, our controller attains an optimal regret bound of $\sqrt{T}$-when the system is known, and, when combined with an initial stage of least-squares estimation, $T^{2/3}$ when the system is unknown; both yield the first sublinear regret for the partially observed setting. Our bounds are the first in the non-stochastic control setting that compete with \emph{all} stabilizing linear dynamical controllers, not just state feedback. Moreover, in the presence of semi-adversarial noise containing both stochastic and adversarial components, our controller attains the optimal regret bounds of $\mathrm{poly}(\log T)$ when the system is known, and $\sqrt{T}$ when unknown. To our knowledge, this gives the first end-to-end $\sqrt{T}$ regret for online Linear Quadratic Gaussian controller, and applies in a more general setting with adversarial losses and semi-adversarial noise.

controller, nat 1, sequence, (16 more...)

arXiv.org Machine Learning

2001.09254

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

PCGRL: Procedural Content Generation via Reinforcement Learning

Khalifa, Ahmed, Bontrager, Philip, Earle, Sam, Togelius, Julian

arXiv.org Artificial IntelligenceJan-24-2020

We investigate how reinforcement learning can be used to train level-designing agents. This represents a new approach to procedural content generation in games, where level design is framed as a game, and the content generator itself is learned. By seeing the design problem as a sequential task, we can use reinforcement learning to learn how to take the next action so that the expected final level quality is maximized. This approach can be used when few or no examples exist to train from, and the trained generator is very fast. We investigate three different ways of transforming two-dimensional level design problems into Markov decision processes and apply these to three game environments.

agent, content generation, representation, (15 more...)

arXiv.org Artificial Intelligence

2001.09212

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Exploration Based Language Learning for Text-Based Games

Madotto, Andrea, Namazifar, Mahdi, Huizinga, Joost, Molino, Piero, Ecoffet, Adrien, Zheng, Huaixiu, Papangelis, Alexandros, Yu, Dian, Khatri, Chandra, Tur, Gokhan

arXiv.org Artificial IntelligenceJan-23-2020

This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space.

admissible action, text-based game, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2001.08868

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games > Computer Games (0.54)
Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Graph Constrained Reinforcement Learning for Natural Language Action Spaces

Ammanabrolu, Prithviraj, Hausknecht, Matthew

arXiv.org Artificial IntelligenceJan-23-2020

Interactive Fiction games are text-based simulations in which an agent interacts with the world purely through natural language. They are ideal environments for studying how to extend reinforcement learning agents to meet the challenges of natural language understanding, partial observability, and action generation in combinatorially-large text-based action spaces. We present KG-A2C, an agent that builds a dynamic knowledge graph while exploring and generates actions using a template-based action space. We contend that the dual uses of the knowledge graph to reason about game state and to constrain natural language generation are the keys to scalable exploration of combinatorially large natural language actions. Results across a wide variety of IF games show that KG-A2C outperforms current IF agents despite the exponential increase in action space size.

obj, obj 0, template prob, (16 more...)

arXiv.org Artificial Intelligence

2001.08837

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Socially intelligent task and motion planning for human-robot interaction

Frank, Andrea, Riek, Laurel

arXiv.org Artificial IntelligenceJan-23-2020

As social beings, much human behavior is predicated on social context - the ambient social state that includes cultural norms, social signals, individual preferences, etc. In this paper, we propose a socially-aware task and motion planning algorithm that considers social context to generate appropriate and effective plans in human social environments (HSEs). The key strength of our proposed approach is that it explicitly models how potential actions not only affect objective cost, but also transform the social context in which it plans and acts. We investigate strategies to limit the complexity of our algorithm, so that our planner will remain tractable for mobile platforms in complex HSEs like hospitals and factories. The planner will also consider the relative importance and urgency of its tasks, which it uses to determine when it is and is not appropriate to violate social expectations to achieve its objective. This social awareness will allow robots to understand a fundamental rule of society: just because something makes your job easier, does not make it the right thing to do! To our knowledge, the proposed work is the first task and motion planning approach that supports socially intelligent robot policy for HSEs. Through this ongoing work, robots will be able to understand, respect, and leverage social context accomplish tasks both acceptably and effectively in HSEs.

interaction, social context, task and motion planning, (17 more...)

arXiv.org Artificial Intelligence

2001.08398

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Health Care Providers & Services (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Ortner, Ronald (Montanuniversitaet Leoben)

Journal of Artificial Intelligence ResearchJan-23-2020

We give a simple optimistic algorithm for which it is easy to derive regret bounds of O(sqrt{t-mix SAT}) steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t-mix. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.

artificial intelligence, data mining, machine learning, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11316

AI Access Foundation

11316

Journal of Artificial Intelligence Research

Country:

Europe > Austria > Styria > Leoben (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.75)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

Ultes, Stefan

arXiv.org Artificial IntelligenceJan-21-2020

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we use a reward based on user satisfaction estimation. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. Furthermore, we apply this novel user satisfaction estimation model live in simulated experiments where the satisfaction estimation model is trained on one domain and applied in many other domains which cover a similar task. We show that applying this model results in higher estimated satisfaction, similar task success rates and a higher robustness to noise.

computational linguistic, dialogue, estimator, (13 more...)

arXiv.org Artificial Intelligence

2001.07615

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Emergence of Pragmatics from Referential Game between Theory of Mind Agents

Yuan, Luyao, Fu, Zipeng, Shen, Jingyue, Xu, Lu, Shen, Junhong, Zhu, Song-Chun

arXiv.org Artificial IntelligenceJan-21-2020

Pragmatics studies how context can contribute to language meanings [1]. In human communication, language is never interpreted out of context, and sentences can usually convey more information than their literal meanings [2]. However, this mechanism is missing in most multi-agent systems [3, 4, 5, 6], restricting the communication efficiency and the capability of human-agent interaction. In this paper, we propose an algorithm, using which agents can spontaneously learn the ability to "read between lines" without any explicit hand-designed rules. We integrate the theory of mind (ToM) [7, 8] in a cooperative multi-agent pedagogical situation and propose an adaptive reinforcement learning (RL) algorithm to develop a communication protocol. ToM is a profound cognitive science concept, claiming that people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. With this ability, agents consider language as not only messages but also rational acts reflecting others' hidden states. Our experiments demonstrate the advantage of pragmatic protocols over non-pragmatic protocols. We also show the teaching complexity following the pragmatic protocol empirically approximates to recursive teaching dimension (RTD).

agent, communication, student, (13 more...)

arXiv.org Artificial Intelligence

2001.07752

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Stochastic Finite State Control of POMDPs with LTL Specifications

Ahmadi, Mohamadreza, Sharan, Rangoli, Burdick, Joel W.

arXiv.org Artificial IntelligenceJan-21-2020

Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty and imperfect sensing, e.g. robot manipulation and self-driving cars. However, optimal control of POMDPs is notoriously intractable. This paper considers the quantitative problem of synthesizing sub-optimal stochastic finite state controllers (sFSCs) for POMDPs such that the probability of satisfying a set of high-level specifications in terms of linear temporal logic (LTL) formulae is maximized. We begin by casting the latter problem into an optimization and use relaxations based on the Poisson equation and McCormick envelopes. Then, we propose an stochastic bounded policy iteration algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations, but can be stopped by the user based on time or memory considerations. We illustrate the proposed method by a robot navigation case study.

i-state, markov chain, probability, (15 more...)

arXiv.org Artificial Intelligence

2001.07679

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Implementations in Machine Ethics: A Survey

Tolmeijer, Suzanne, Kneer, Markus, Sarasua, Cristina, Christen, Markus, Bernstein, Abraham

arXiv.org Artificial IntelligenceJan-21-2020

Increasingly complex and autonomous systems require machine ethics to maximize the benefits and minimize the risks to society arising from the new technology. It is challenging to decide which type of ethical theory to employ and how to implement it effectively. This survey provides a threefold contribution. Firstly, it introduces a taxonomy to analyze the field of machine ethics from an ethical, implementational, and technical perspective. Secondly, an exhaustive selection and description of relevant works is presented. Thirdly, applying the new taxonomy to the selected works, dominant research patterns and lessons for the field are identified, and future directions for research are suggested.

ethics, implementation, machine ethics, (15 more...)

arXiv.org Artificial Intelligence

2001.07573

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.29)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Mateo County > Menlo Park (0.14)
(24 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.47)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government > Military (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback