pong
Verifiable Reinforcement Learning via Policy Extraction
While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.
Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning
Beylier, Charlotte, Selder, Hannah, Fleig, Arthur, Hofmann, Simon M., Scherf, Nico
While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interp ret when evaluated only through performance metrics. In particular, it is poorly understoo d which input features agents rely on, how these dependencies evolve during training, and how t hey relate to behavior. We introduce a scientific methodology for analyzing the learni ng process through quantitative analysis of saliency. This approach aggregates saliency in formation at the object and modality level into hierarchical attention profiles, quantifyin g how agents allocate attention over time, thereby forming attention trajectories throughout t raining. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biom echanical user simulations in visuomotor interactive tasks, this methodology uncovers a lgorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnos es overfitting to redundant sensory channels. These patterns correspond to measurable behavio ral differences, demonstrating empirical links between attention profiles, learning dynam ics, and agent behavior. To assess robustness of the attention profiles, we validate our finding s across multiple saliency methods and environments. The results establish attention traj ectories as a promising diagnostic axis for tracing how feature reliance develops during train ing and for identifying biases and vulnerabilities invisible to performance metrics alone.
- Europe > Germany > Saxony > Leipzig (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Leisure & Entertainment > Games (0.94)
Verifiable Reinforcement Learning via Policy Extraction
While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Learning Game-Playing Agents with Generative Code Optimization
Kuang, Zhiyi, Rong, Ryan, Yuan, YuCheng, Nie, Allen
We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > Canada (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Portugal > Braga > Braga (0.04)
From Pong to Wii Sports: the surprising legacy of tennis in gaming history
With Wimbledon under way, I am going to grasp the opportunity to make a perhaps contentious claim: tennis is the most important sport in the history of video games. Sure, nowadays the big sellers are EA Sports FC, Madden and NBA 2K, but tennis has been foundational to the industry. It was a simple bat-and-ball game, created in 1958 by scientist William Higinbotham at the Brookhaven National Laboratory in Upton, New York, that is widely the considered the first ever video game created purely for entertainment. Tennis for Two ran on an oscilloscope and was designed as a minor diversion for visitors attending the lab's annual open day, but when people started playing, a queue developed that eventually extended out of the front door and around the side of the building. It was the first indication that computer games might turn out to be popular.
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.61)
- North America > United States > New York (0.25)
- Asia > Japan (0.05)
- Leisure & Entertainment > Sports > Tennis (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
Advancing Generalization Across a Variety of Abstract Visual Reasoning Tasks
Małkiński, Mikołaj, Mańdziuk, Jacek
The abstract visual reasoning (AVR) domain presents a diverse suite of analogy-based tasks devoted to studying model generalization. Recent years have brought dynamic progress in the field, particularly in i.i.d. scenarios, in which models are trained and evaluated on the same data distributions. Nevertheless, o.o.d. setups that assess model generalization to new test distributions remain challenging even for the most recent models. To advance generalization in AVR tasks, we present the Pathways of Normalized Group Convolution model (PoNG), a novel neural architecture that features group convolution, normalization, and a parallel design. We consider a wide set of AVR benchmarks, including Raven's Progressive Matrices and visual analogy problems with both synthetic and real-world images. The experiments demonstrate strong generalization capabilities of the proposed model, which in several settings outperforms the existing literature methods.
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > Poland > Lesser Poland Province > Kraków (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
A lifeless hydrogel blob can play Pong
Inspired by recent advancements in brain organoid systems, researchers have designed a simple hydrogel-electrode array that not only can "play" Pong, but improve its gameplay over time. Debuted by Atari in 1972, Pong is one of the most rudimentary but influential video games of all time. Although it just features two player paddles and a pixelated "ball" ricocheting between them, it still serves as a helpful benchmark for training not just artificial intelligence and neural networks, but also organoid intelligence, or OI. Grown from stem cells into rudimentary "brains," these OI systems may one day provide promising alternatives to more traditional hardware. But both AI and OI are extremely complex, costly industries--what if much simpler arrays could achieve similar results?
The real-life Flubber? Glob of jelly can play Pong thanks to a basic kind of memory, bizarre study reveals
In the 1997 Robin Williams flick Flubber, an absent-minded professor creates a sentient ball of goo with incredible capabilities. Now, more than 25 years later, scientists have made a surprising discovery that could bring Flubber into the real world. Researchers from the University of Reading have created a non-living'hydrogel brain' which is capable of playing the video game Pong. Using a plate of electrodes hooked up to the classic game, the water-based jelly even managed to get 10 per cent better as it practised. While it might not be quite as bouncy as Robin Williams' invention, the researchers believe this breakthrough could change the future of artificial intelligence.