AITopics | Pan, Hsiao-Ru

Collaborating Authors

Pan, Hsiao-Ru

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Skill or Luck? Return Decomposition via Advantage Functions

Pan, Hsiao-Ru, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceFeb-20-2024

Learning from off-policy data is essential for sample-efficient reinforcement learning. In the present work, we build on the insight that the advantage function can be understood as the causal effect of an action on the return, and show that this allows us to decompose the return of a trajectory into parts caused by the agent's actions (skill) and parts outside of the agent's control (luck). Furthermore, this decomposition enables us to naturally extend Direct Advantage Estimation (DAE) to off-policy settings (Off-policy DAE). The resulting method can learn from off-policy trajectories without relying on importance sampling techniques or truncating off-policy actions. We draw connections between Off-policy DAE and previous methods to demonstrate how it can speed up learning and when the proposed off-policy corrections are important. Finally, we use the MinAtar environments to illustrate how ignoring off-policy corrections can lead to suboptimal policy optimization performance.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2402.12874

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions

Keurti, Hamza, Pan, Hsiao-Ru, Besserve, Michel, Grewe, Benjamin F., Schölkopf, Bernhard

arXiv.org Artificial IntelligenceJun-6-2023

Humans acquire such internal models by interacting with the world, but the learning principles allowing it How can agents learn internal models that veridically remain elusive. We investigate how Machine Learning (ML) represent interactions with the real world can shed light on this question, as it moves towards representations is a largely open question. As machine learning that carry more than just observational information is moving towards representations containing not (Sutton & Barto, 2015; Schölkopf et al., 2021) and develops just observational but also interventional knowledge, tools for interactive and geometric structure learning (Cohen we study this problem using tools from representation & Welling, 2016; Eslami et al., 2018), learning and group theory. We propose methods enabling an agent acting upon the Our setting is inspired by neuroscientific evidence that, as world to learn internal representations of sensory animals use their motor apparatus to act, efference copies of information that are consistent with actions that motor signals are sent to the brain's sensory system where modify it. We use an autoencoder equipped with they are integrated with incoming sensory observations to a group representation acting on its latent space, predict future sensory inputs (Keller et al., 2012). We argue trained using an equivariance-derived loss in order that such efference copies can be useful for learning to enforce a suitable homomorphism property on structured latent representations of sensory observations the group representation. In contrast to existing and for disentangling the key latent factors of behavioral work, our approach does not require prior knowledge relevance. This view is also in line with hypotheses formulated of the group and does not restrict the set of by developmental psychology (Piaget, 1964), stating actions the agent can perform.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2207.12067

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Hawaii (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Direct Advantage Estimation

Pan, Hsiao-Ru, Gürtler, Nico, Neitz, Alexander, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceFeb-6-2023

The predominant approach in reinforcement learning is to assign credit to actions based on the expected return. However, we show that the return may depend on the policy in a way which could lead to excessive variance in value estimation and slow down learning. Instead, we show that the advantage function can be interpreted as causal effects and shares similar properties with causal representations. Based on this insight, we propose Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from on-policy data while simultaneously minimizing the variance of the return without requiring the (action-)value function. We also relate our method to Temporal Difference methods by showing how value functions can be seamlessly integrated into DAE. The proposed method is easy to implement and can be readily adapted by modern actor-critic methods. We evaluate DAE empirically on three discrete control domains and show that it can outperform generalized advantage estimation (GAE), a strong baseline for advantage estimation, on a majority of the environments when applied to policy optimization.

advantage function, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2109.06093

Country: Europe > Germany (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback