Goto

Collaborating Authors

 pong


Verifiable Reinforcement Learning via Policy Extraction

Neural Information Processing Systems

While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.


Verifiable Reinforcement Learning via Policy Extraction

Osbert Bastani, Yewen Pu, Armando Solar-Lezama

Neural Information Processing Systems

Trajectoriestakenby , left : s 7! left, and right : s 7! rightareshownas dashededges, rededges, andgreenedges, respectively. Let ={ left : s 7! left, right : s 7! right}, andletg( )= Es d( )[g(s, )]bethe 0-1 loss.



b6f8dc086b2d60c5856e4ff517060392-Supplemental.pdf

Neural Information Processing Systems

InEXPAND,weaugmenteachhuman evaluated state to 5 states. To verify 5issufficient, we also experimented with the numbers of augmentations required in each state to get the best performance. AGIL [50] was designed to utilize saliency map collected via human gaze. The network architectures are shown in Figure 1. Hence, we view the output of attention network as the prediction of whether a pixel should be included in a human annotated boundingbox.


Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

Beylier, Charlotte, Selder, Hannah, Fleig, Arthur, Hofmann, Simon M., Scherf, Nico

arXiv.org Artificial Intelligence

While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interp ret when evaluated only through performance metrics. In particular, it is poorly understoo d which input features agents rely on, how these dependencies evolve during training, and how t hey relate to behavior. We introduce a scientific methodology for analyzing the learni ng process through quantitative analysis of saliency. This approach aggregates saliency in formation at the object and modality level into hierarchical attention profiles, quantifyin g how agents allocate attention over time, thereby forming attention trajectories throughout t raining. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biom echanical user simulations in visuomotor interactive tasks, this methodology uncovers a lgorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnos es overfitting to redundant sensory channels. These patterns correspond to measurable behavio ral differences, demonstrating empirical links between attention profiles, learning dynam ics, and agent behavior. To assess robustness of the attention profiles, we validate our finding s across multiple saliency methods and environments. The results establish attention traj ectories as a promising diagnostic axis for tracing how feature reliance develops during train ing and for identifying biases and vulnerabilities invisible to performance metrics alone.


Verifiable Reinforcement Learning via Policy Extraction

Neural Information Processing Systems

While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.



Learning Game-Playing Agents with Generative Code Optimization

Kuang, Zhiyi, Rong, Ryan, Yuan, YuCheng, Nie, Allen

arXiv.org Artificial Intelligence

We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.


From Pong to Wii Sports: the surprising legacy of tennis in gaming history

The Guardian

With Wimbledon under way, I am going to grasp the opportunity to make a perhaps contentious claim: tennis is the most important sport in the history of video games. Sure, nowadays the big sellers are EA Sports FC, Madden and NBA 2K, but tennis has been foundational to the industry. It was a simple bat-and-ball game, created in 1958 by scientist William Higinbotham at the Brookhaven National Laboratory in Upton, New York, that is widely the considered the first ever video game created purely for entertainment. Tennis for Two ran on an oscilloscope and was designed as a minor diversion for visitors attending the lab's annual open day, but when people started playing, a queue developed that eventually extended out of the front door and around the side of the building. It was the first indication that computer games might turn out to be popular.


Advancing Generalization Across a Variety of Abstract Visual Reasoning Tasks

Małkiński, Mikołaj, Mańdziuk, Jacek

arXiv.org Artificial Intelligence

The abstract visual reasoning (AVR) domain presents a diverse suite of analogy-based tasks devoted to studying model generalization. Recent years have brought dynamic progress in the field, particularly in i.i.d. scenarios, in which models are trained and evaluated on the same data distributions. Nevertheless, o.o.d. setups that assess model generalization to new test distributions remain challenging even for the most recent models. To advance generalization in AVR tasks, we present the Pathways of Normalized Group Convolution model (PoNG), a novel neural architecture that features group convolution, normalization, and a parallel design. We consider a wide set of AVR benchmarks, including Raven's Progressive Matrices and visual analogy problems with both synthetic and real-world images. The experiments demonstrate strong generalization capabilities of the proposed model, which in several settings outperforms the existing literature methods.