AITopics | Vrancx, Peter

Plotting

Vrancx, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Policy Optimization Through Approximated Importance Sampling

Tomczak, Marcin B., Kim, Dongho, Vrancx, Peter, Kim, Kee-Eung

arXiv.org Artificial IntelligenceOct-9-2019

Recent policy optimization approaches (Schulman et al., 2015a, 2017) have achieved substantial empirical successes by constructing new proxy optimization objectives. These proxy objectives allow stable and low variance policy learning, but require small policy updates to ensure that the proxy objective remains an accurate approximation of the target policy value. In this paper we derive an alternative objective that obtains the value of the target policy by applying importance sampling. This objective can be directly estimated from samples, as it takes an expectation over trajectories generated by the current policy. However, the basic importance sampled objective is not suitable for policy optimization, as it incurs unacceptable variance. We therefore introduce an approximation that allows us to directly trade-off the bias of approximation with the variance in policy updates. We show that our approximation unifies the proxy optimization approaches with the importance sampling objective and allows us to interpolate between them. We then provide a theoretical analysis of the method that directly quantifies the error term due to the approximation. Finally, we obtain a practical algorithm by optimizing the introduced objective with proximal policy optimization techniques (Schulman etal., 2017). We empirically demonstrate that the result-ing algorithm yields superior performance on continuous control benchmarks

artificial intelligence, null, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

1910.03857

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Compatible features for Monotonic Policy Improvement

Tomczak, Marcin B., de Cote, Enrique Munoz, Macua, Sergio Valcarcel, Vrancx, Peter

arXiv.org Artificial IntelligenceOct-9-2019

Recent policy optimization approaches have achieved substantial empirical success by constructing surrogate optimization objectives. The Approximate Policy Iteration objective (Schulman et al., 2015a; Kakade and Langford, 2002) has become a standard optimization target for reinforcement learning problems. Using this objective in practice requires an estimator of the advantage function. Policy optimization methods such as those proposed in Schulman et al. (2015b) estimate the advantages using a parametric critic. In this work we establish conditions under which the parametric approximation of the critic does not introduce bias to the updates of surrogate objective. These results hold for a general class of parametric policies, including deep neural networks. We obtain a result analogous to the compatible features derived for the original Policy Gradient Theorem (Sutton et al., 1999). As a result, we also identify a previously unknown bias that current state-of-the-art policy optimization algorithms (Schulman et al., 2015a, 2017) have introduced by not employing these compatible features.

compatible feature, neural network, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

1910.0388

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

Disentangled Skill Embeddings for Reinforcement Learning

Petangoda, Janith C., Pascual-Diaz, Sergio, Adam, Vincent, Vrancx, Peter, Grau-Moya, Jordi

arXiv.org Artificial IntelligenceJun-21-2019

We propose a novel framework for multi-task reinforcement learning (MTRL). Using a variational inference formulation, we learn policies that generalize across both changing dynamics and goals. The resulting policies are parametrized by shared parameters that allow for transfer between different dynamics and goal conditions, and by task-specific latent-space embeddings that allow for specialization to particular tasks. We show how the latent-spaces enable generalization to unseen dynamics and goals conditions. Additionally, policies equipped with such embeddings serve as a space of skills (or options) for hierarchical reinforcement learning. Since we can change task dynamics and goals independently, we name our framework Disentangled Skill Embeddings (DSE).

artificial intelligence, reinforcement learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

1906.09223

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reports on the 2018 AAAI Spring Symposium Series

AI MagazineDec-14-2018

The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford University’s Department of Computer Science, presented the 2018 Spring Symposium Series, held Monday through Wednesday, March 26–28, 2018, on the campus of Stanford University. The seven symposia held were AI and Society: Ethics, Safety and Trustworthiness in Intelligent Agents; Artificial Intelligence for the Internet of Everything; Beyond Machine Intelligence: Understanding Cognitive Bias and Humanity for Well-Being AI; Data Efficient Reinforcement Learning; The Design of the User Experience for Artificial Intelligence (the UX of AI); Integrated Representation, Reasoning, and Learning in Robotics; Learning, Inference, and Control of Multi-Agent Systems. This report, compiled from organizers of the symposia, summarizes the research of five of the symposia that took place.

aaai spring symposium series, artificial intelligence

AI Magazine

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Model-Based Stabilisation of Deep Reinforcement Learning

Leibfried, Felix, Tutunov, Rasul, Vrancx, Peter, Bou-Ammar, Haitham

arXiv.org Machine LearningSep-6-2018

Though successful in high-dimensional domains, deep reinforcement learning exhibits high sample complexity and suffers from stability issues as reported by researchers and practitioners in the field. These problems hinder the application of such algorithms in real-world and safety-critical scenarios. In this paper, we take steps towards stable and efficient reinforcement learning by following a model-based approach that is known to reduce agent-environment interactions. Namely, our method augments deep Q-networks (DQNs) with model predictions for transitions, rewards, and termination flags. Having the model at hand, we then conduct a rigorous theoretical study of our algorithm and show, for the first time, convergence to a stationary point. En route, we provide a counter-example showing that 'vanilla' DQNs can diverge confirming practitioners' and researchers' experiences. Our proof is novel in its own right and can be extended to other forms of deep reinforcement learning. In particular, we believe exploiting the relation between reinforcement (with deep function approximators) and online learning can serve as a recipe for future proofs in the domain. Finally, we validate our theoretical results in 20 games from the Atari benchmark. Our results show that following the proposed model-based learning approach not only ensures convergence but leads to a reduction in sample complexity and superior performance.

deep learning, neural network, vec, (20 more...)

arXiv.org Machine Learning

1809.01906

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.54)

Industry:

Leisure & Entertainment > Games (0.69)
Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Reinforcement Learning in POMDPs With Memoryless Options and Option-Observation Initiation Sets

Steckelmacher, Denis (Vrije Universiteit Brussels) | Roijers, Diederik M. (Vrije Universiteit Brussels) | Harutyunyan, Anna (Vrije Universiteit Brussels) | Vrancx, Peter (PROWLER.io) | Plisnier, Hélène (Vrije Universiteit Brussels) | Nowé, Ann (Vrije Universiteit Brussels)

AAAI ConferencesFeb-8-2018

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.

deep learning, neural network, oois, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Europe (0.68)
North America > United States (0.49)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Learning With Options That Terminate Off-Policy

Harutyunyan, Anna (Vrije Universiteit Brussel) | Vrancx, Peter (PROWLER.io) | Bacon, Pierre-Luc (McGill University) | Precup, Doina (McGill University) | Nowé, Ann (Vrije Universiteit Brussel)

AAAI ConferencesFeb-8-2018

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides the option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy well, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(beta), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(beta) by casting learning with options into a common framework with well-studied multi-step off policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Expressing Arbitrary Reward Functions as Potential-Based Advice

Harutyunyan, Anna (Vrije Universiteit Brussel) | Devlin, Sam (University of York) | Vrancx, Peter (Vrije Universiteit Brussel) | Nowe, Ann (Vrije Universiteit Brussel)

AAAI ConferencesMar-6-2015

Effectively incorporating external advice is an important problem in reinforcement learning, especially as it moves into the real world. Potential-based reward shaping is a way to provide the agent with a specific form of additional reward, with the guarantee of policy invariance. In this work we give a novel way to incorporate an arbitrary reward function with the same guarantee, by implicitly translating it into the specific form of dynamic advice potentials, which are maintained as an auxiliary value function learnt at the same time. We show that advice provided in this way captures the input reward function in expectation, and demonstrate its efficacy empirically.

artificial intelligence, reinforcement learning, reward function, (17 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback