Goto

Collaborating Authors

 interpretable policy


Information Templates: A New Paradigm for Intelligent Active Feature Acquisition

Huang, Hung-Tien, Dinh, Dzung, Oliva, Junier B.

arXiv.org Artificial Intelligence

Active feature acquisition (AFA) is an instance-adaptive paradigm in which, at test time, a policy sequentially chooses which features to acquire (at a cost) before predicting. Existing approaches either train reinforcement learning (RL) policies, which deal with a difficult MDP, or greedy policies that cannot account for the joint informativeness of features or require knowledge about the underlying data distribution. To overcome this, we propose Template-based AFA (TAFA), a non-greedy framework that learns a small library of feature templates--a set of features that are jointly informative--and uses this library of templates to guide the next feature acquisitions. Through identifying feature templates, the proposed framework not only significantly reduces the action space considered by the policy but also alleviates the need to estimate the underlying data distribution. Extensive experiments on synthetic and real-world datasets show that TAFA outperforms the existing state-of-the-art baselines while achieving lower overall acquisition cost and computation.


"So, Tell Me About Your Policy...": Distillation of interpretable policies from Deep Reinforcement Learning agents

Dispoto, Giovanni, Bonetti, Paolo, Restelli, Marcello

arXiv.org Artificial Intelligence

Recent advances in Reinforcement Learning (RL) largely benefit from the inclusion of Deep Neural Networks, boosting the number of novel approaches proposed in the field of Deep Reinforcement Learning (DRL). These techniques demonstrate the ability to tackle complex games such as Atari, Go, and other real-world applications, including financial trading. Nevertheless, a significant challenge emerges from the lack of interpretability, particularly when attempting to comprehend the underlying patterns learned, the relative importance of the state features, and how they are integrated to generate the policy's output. For this reason, in mission-critical and real-world settings, it is often preferred to deploy a simpler and more interpretable algorithm, although at the cost of performance. In this paper, we propose a novel algorithm, supported by theoretical guarantees, that can extract an interpretable policy (e.g., a linear policy) without disregarding the peculiarities of expert behavior. This result is obtained by considering the advantage function, which includes information about why an action is superior to the others. In contrast to previous works, our approach enables the training of an interpretable policy using previously collected experience. The proposed algorithm is empirically evaluated on classic control environments and on a financial trading scenario, demonstrating its ability to extract meaningful information from complex expert policies.


Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs

Kohler, Hector, Delfosse, Quentin, Radji, Waris, Akrour, Riad, Preux, Philippe

arXiv.org Artificial Intelligence

There exist applications of reinforcement learning like medicine where policies need to be ''interpretable'' by humans. User studies have shown that some policy classes might be more interpretable than others. However, it is costly to conduct human studies of policy interpretability. Furthermore, there is no clear definition of policy interpretabiliy, i.e., no clear metrics for interpretability and thus claims depend on the chosen definition. We tackle the problem of empirically evaluating policies interpretability without humans. Despite this lack of clear definition, researchers agree on the notions of ''simulatability'': policy interpretability should relate to how humans understand policy actions given states. To advance research in interpretable reinforcement learning, we contribute a new methodology to evaluate policy interpretability. This new methodology relies on proxies for simulatability that we use to conduct a large-scale empirical evaluation of policy interpretability. We use imitation learning to compute baseline policies by distilling expert neural networks into small programs. We then show that using our methodology to evaluate the baselines interpretability leads to similar conclusions as user studies. We show that increasing interpretability does not necessarily reduce performances and can sometimes increase them. We also show that there is no policy class that better trades off interpretability and performance across tasks making it necessary for researcher to have methodologies for comparing policies interpretability.


From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation

Li, Peilang, Siddique, Umer, Cao, Yongcan

arXiv.org Artificial Intelligence

Deep reinforcement learning (RL) has shown remarkable success in complex domains, however, the inherent black box nature of deep neural network policies raises significant challenges in understanding and trusting the decision-making processes. While existing explainable RL methods provide local insights, they fail to deliver a global understanding of the model, particularly in high-stakes applications. To overcome this limitation, we propose a novel model-agnostic approach that bridges the gap between explainability and interpretability by leveraging Shapley values to transform complex deep RL policies into transparent representations. The proposed approach offers two key contributions: a novel approach employing Shapley values to policy interpretation beyond local explanations and a general framework applicable to off-policy and on-policy algorithms. We evaluate our approach with three existing deep RL algorithms and validate its performance in two classic control environments. The results demonstrate that our approach not only preserves the original models' performance but also generates more stable interpretable policies.


Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop

Kohler, Hector, Delfosse, Quentin, Festor, Paul, Preux, Philippe

arXiv.org Artificial Intelligence

Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in policies, without user studies? What reinforcement learning paradigms,are the most suited to develop interpretable agents? Can Markov Decision Processes integrate interpretable state representations? In addition to motivate an Interpretable RL community centered around the aforementioned questions, we propose the first venue dedicated to Interpretable RL: the InterpPol Workshop.


Boolean Decision Rules for Reinforcement Learning Policy Summarisation

McCarthy, James, Nair, Rahul, Daly, Elizabeth, Marinescu, Radu, Dusparic, Ivana

arXiv.org Artificial Intelligence

Explainability of Reinforcement Learning (RL) policies remains a challenging research problem, particularly when considering RL in a safety context. Understanding the decisions and intentions of an RL policy offer avenues to incorporate safety into the policy by limiting undesirable actions. We propose the use of a Boolean Decision Rules model to create a post-hoc rule-based summary of an agent's policy. We evaluate our proposed approach using a DQN agent trained on an implementation of a lava gridworld and show that, given a hand-crafted feature representation of this gridworld, simple generalised rules can be created, giving a post-hoc explainable summary of the agent's policy. We discuss possible avenues to introduce safety into a RL agent's policy by using rules generated by this rule-based model as constraints imposed on the agent's policy, as well as discuss how creating simple rule summaries of an agent's policy may help in the debugging process of RL agents.


Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies

Wu, Han, Tan, Sarah, Li, Weiwei, Garrard, Mia, Obeng, Adam, Dimmery, Drew, Singh, Shaun, Wang, Hanson, Jiang, Daniel, Bakshy, Eytan

arXiv.org Machine Learning

Internet companies are increasingly using machine learning models to create personalized policies which assign, for each individual, the best predicted treatment for that individual. They are frequently derived from black-box heterogeneous treatment effect (HTE) models that predict individual-level treatment effects. In this paper, we focus on (1) learning explanations for HTE models; (2) learning interpretable policies that prescribe treatment assignments. We also propose guidance trees, an approach to ensemble multiple interpretable policies without the loss of interpretability. These rule-based interpretable policies are easy to deploy and avoid the need to maintain a HTE model in a production environment.


Explainable Autonomous Robots: A Survey and Perspective

Sakai, Tatsuya, Nagai, Takayuki

arXiv.org Artificial Intelligence

It is commonly claimed that AI will replace most manual labor in the future; however, is this really the case? AI technologies do have higher image recognition accuracy compared to humans in some limited contexts, and have consistently outperformed humans in classical games such as Go and chess. Nonetheless, we believe that even advanced future developments based on current technology will not lead to robots replacing humans. AI systems' fundamental lack of ability to communicate naturally and effectively with humans is among the most significant reasons that they cannot replace human labor. Here, one may believe that such communication could be achieved via the development of natural language processing (NLP) technology [4]; however, NLP technologies are systems for estimating the content of human statements and their meanings; they do not constitute communication. That is, humans do not feel that robots using such systems truly understand and respond to them appropriately. Therefore, if effective communication is not achieved, robots will continue to function only as tools to assist humans. Advancements improving the accuracy or effectiveness of various specific tasks do not indicate that robots are equivalent to human beings. Under this scenario, how can we enable robots to communicate with humans?


Differentiable Logic Machines

Zimmer, Matthieu, Feng, Xuening, Glanois, Claire, Jiang, Zhaohui, Zhang, Jianyi, Weng, Paul, Jianye, Hao, Dong, Li, Wulong, Liu

arXiv.org Artificial Intelligence

The integration of reasoning, learning, and decision-making is key to build more general AI systems. As a step in this direction, we propose a novel neural-logic architecture that can solve both inductive logic programming (ILP) and deep reinforcement learning (RL) problems. Our architecture defines a restricted but expressive continuous space of first-order logic programs by assigning weights to predicates instead of rules. Therefore, it is fully differentiable and can be efficiently trained with gradient descent. Besides, in the deep RL setting with actor-critic algorithms, we propose a novel efficient critic architecture. Compared to state-of-the-art methods on both ILP and RL problems, our proposition achieves excellent performance, while being able to provide a fully interpretable solution and scaling much better, especially during the testing phase.


Reinforcement Learning from a Mixture of Interpretable Experts

Akrour, Riad, Tateo, Davide, Peters, Jan

arXiv.org Machine Learning

Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. These successes however are mostly achieved by 'black-box' policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a 'black-box' policy might be raised. In an effort to make the policies learned by RL more transparent, we propose in this paper a policy iteration scheme that retains a complex function approximator for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure, based on a mixture of interpretable experts. We show that our proposed algorithm can learn compelling policies on continuous action deep RL benchmarks, matching the performance of neural network based policies, but returns policies that are more amenable to human inspection than neural network or linear-in-feature policies.