AITopics | van seijen

Collaborating Authors

van seijen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

97108695bd93b6be52fa0334874c8722-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 06:38:04 GMT

atom, disentanglement loss, experiment, (16 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Review for NeurIPS paper: Munchausen Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 00:35:22 GMT

Additional Feedback: After Authors' Reponse: I still find the paper's analysis regarding action-gaps a bit weak, and the authors' response didn't help much in that regard. I think their action-gap analysis needs to be considered under the new findings of (van Seijen et al., 2019); increasing the action-gap is not important on its own, rather it's the homogeneity of the action-gaps across the states that is important. While I still stand by my verdict of accepting this paper, in light of other reviews, I think the paper's writing should be toned down a bit in regards to its theoretical novelty and claims about empirical results (e.g. the first non-dist-RL to beat a dist-RL). Q1: To the best of my knowledge, IQN in Dopamine also uses Double Q-learning. Is this also the case for your M-IQN agent?

munchausen reinforcement learning, reinforcement learning, van seijen, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

van Seijen

AAAI ConferencesFeb-8-2022, 13:01:04 GMT

This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of candidate abstractions, each build up from a different combination of state components.

abstraction selection, reformulation, van seijen, (3 more...)

AAAI Conferences

Industry: Education > Focused Education > Special Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.53)

Add feedback

Supplementary material for Uncorrected least-squares temporal difference with lambda-return

Osogami, Takayuki

arXiv.org Machine LearningNov-14-2019

November 15, 2019 Abstract Here, we provide a supplementary material for Takayuki Osogami, "Uncorrected least-squares temporal difference with lambda-return," which appears in Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20) [Osogami, 2019]. A Proofs In this section, we prove Theorem 1, Lemma 1, Theorem 2, Lemma 2, and Proposition 1. Note that equations (1)-(19) refers to those in Osogami [2019]. A.1 Proof of Theorem 1 From (7)-(8), we have the following equality: A Unc T 1 T null t 0φ t null φ t (1 λ) γ T t null m 1( λγ) m 1 φ t mnull null (20) T 1 null t 0φ t null φ t (1 λ) γ T t null m 1(λγ) m 1 φ t mnull null φ T φ null T (21) T 1 null t 0φ tnull φ t (1 λ) γ T t 1 null m 1(λγ) m 1 φ t m (1 λ) γ (λγ) T t 1 φ Tnull null φ T φ null T (22) A Unc T T 1 null t 0φ t(1 λ) γ (λγ) T t 1 φ null T φ T φ null T (23) A Unc T null T null t 0(λγ) T t φ tnull φ null T γ null T 1 null t 0( λγ) T t 1 φ tnull φ null T (24) A Unc T ( z T γ z T 1) φ null T . The recursive computation of the eligibility trace can be verified in a straightforward manner.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1911.06057

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

Corneil, Dane, Gerstner, Wulfram, Brea, Johanni

arXiv.org Machine LearningFeb-12-2018

Modern reinforcement learning algorithms reach super-human performance in many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduce VaST (Variational State Tabulation), which maps an environment with a high-dimensional state space (e.g. the space of visual inputs) to an abstract tabular environment. Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1802.04325

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Yang, Long, Shi, Minhao, Zheng, Qian, Meng, Wenjia, Pan, Gang

arXiv.org Machine LearningFeb-9-2018

Recently, a new multi-step temporal learning algorithm, called $Q(\sigma)$, unifies $n$-step Tree-Backup (when $\sigma=0$) and $n$-step Sarsa (when $\sigma=1$) by introducing a sampling parameter $\sigma$. However, similar to other multi-step temporal-difference learning algorithms, $Q(\sigma)$ needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into efficient on-line ones which consume less memory and computation time. In this paper, we further develop the original $Q(\sigma)$, combine it with eligibility traces and propose a new algorithm, called $Q(\sigma ,\lambda)$, in which $\lambda$ is trace-decay parameter. This idea unifies Sarsa$(\lambda)$ (when $\sigma =1$) and $Q^{\pi}(\lambda)$ (when $\sigma =0$). Furthermore, we give an upper error bound of $Q(\sigma ,\lambda)$ policy evaluation algorithm. We prove that $Q(\sigma,\lambda)$ control algorithm can converge to the optimal value function exponentially. We also empirically compare it with conventional temporal-difference learning methods. Results show that, with an intermediate value of $\sigma$, $Q(\sigma ,\lambda)$ creates a mixture of the existing algorithms that can learn the optimal value significantly faster than the extreme end ($\sigma=0$, or $1$).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1802.03171

Country:

Europe (0.28)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Advisor Reinforcement Learning

Laroche, Romain, Fatemi, Mehdi, Romoff, Joshua, van Seijen, Harm

arXiv.org Artificial IntelligenceNov-14-2017

We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is flawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefficient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical findings on a fruit collection task.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1704.00756

Genre:

Research Report (0.84)
Overview (0.66)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

It can't write this story yet, but Microsoft has trained AI to win Ms. Pac-Man

#artificialintelligenceJul-20-2017, 22:10:39 GMT

In the latest sign of artificial intelligence (AI)'s eventual dominance of the workplace, a Canadian deep learning startup-turned-division of Microsoft Corp. has successfully created an AI-based system that achieved the maximum possible score on Ms. Pac-Man. That might not sound like the most complicated task in the world – especially since the edition in question was the Atari 2600 version and not the arcade original – but as Microsoft senior writer Allison Linn explains in a recent blog post, the challenge facing researchers at Montreal-based Maluuba was more daunting than you might think. "A lot of companies working on AI use games to build intelligent algorithms because there's a lot of human-like intelligence capabilities that you need to beat the games," Maluuba program manager Rahul Mehrotra explains in the story, noting that the variety of situations you can encounter while playing the games makes them a good testing ground. In other words, the techniques used to develop the AI-driven Ms. Pac-Man master (or is that mistress?) Like many of its ilk, Ms. Pac-Man was intentionally designed to be easy to learn yet nearly impossible to master so that players would keep dropping in quarters, with co-creator Steve Golson noting that Ms. Pac-Man in particular was programmed to be more random than the original Pac-Man, so it would be harder for players to finish.

artificial intelligence, deep learning, machine learning, (11 more...)

#artificialintelligence

Country: North America > Canada > Quebec > Montreal (0.25)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

AI computer gets first ever perfect score on Ms. Pac-Man

Daily Mail - Science & techJun-15-2017, 16:30:10 GMT

While it might sound like an elusive dream for most, the perfect score for arcade classic Ms. Pac-Man has been achieved – albeit by a computer. Researchers have created an artificial intelligence-based system that learned how to get the maximum score of 999,990 on the addictive 1980s video game. And the innovative method used could help to make advances in other areas of AI research, such as natural language processing. Researchers have created an artificial intelligence-based system that learned how to get the maximum score of 999,990 on the addictive 1980s video game, Ms. Pac-Man The technique, which the team has named'Hybrid Reward Architecture', used 150 agents, which worked in parallel with one another. For example, some agents were rewarded for successfully finding one specific pellet, while others were tasked with staying out of the way of ghosts.

agent, artificial intelligence, natural language, (12 more...)

Daily Mail - Science & tech

Country: North America > Canada > Quebec > Montreal (0.16)

Genre: Research Report (0.36)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.98)

Add feedback