breakout
A Algorithms
We directly adopt the official default setting for Atari games. B.2 Minecraft Environment Settings Table 1 outlines how we set up and initialize the environment for each harvest task. Our method is tested in two different biomes: plains and sunflower plains. Both the plains and sunflower plains offer a wider field of view. In Minecraft, the action space is an 8-dimensional multi-discrete space.
Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning
Beylier, Charlotte, Selder, Hannah, Fleig, Arthur, Hofmann, Simon M., Scherf, Nico
While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interp ret when evaluated only through performance metrics. In particular, it is poorly understoo d which input features agents rely on, how these dependencies evolve during training, and how t hey relate to behavior. We introduce a scientific methodology for analyzing the learni ng process through quantitative analysis of saliency. This approach aggregates saliency in formation at the object and modality level into hierarchical attention profiles, quantifyin g how agents allocate attention over time, thereby forming attention trajectories throughout t raining. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biom echanical user simulations in visuomotor interactive tasks, this methodology uncovers a lgorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnos es overfitting to redundant sensory channels. These patterns correspond to measurable behavio ral differences, demonstrating empirical links between attention profiles, learning dynam ics, and agent behavior. To assess robustness of the attention profiles, we validate our finding s across multiple saliency methods and environments. The results establish attention traj ectories as a promising diagnostic axis for tracing how feature reliance develops during train ing and for identifying biases and vulnerabilities invisible to performance metrics alone.
- Europe > Germany > Saxony > Leipzig (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Leisure & Entertainment > Games (0.94)
Predicting Talent Breakout Rate using Twitter and TV data
Batsaikhan, Bilguun, Fukuda, Hiroyuki
Early detection of rising talents is of paramount importance in the field of advertising. In this paper, we define a concept of talent breakout and propose a method to detect Japanese talents before their rise to stardom. The main focus of the study is to determine the effectiveness of combining Twitter and TV data on predicting time-dependent changes in social data. Although traditional time-series models are known to be robust in many applications, the success of neural network models in various fields (e.g.\ Natural Language Processing, Computer Vision, Reinforcement Learning) continues to spark an interest in the time-series community to apply new techniques in practice. Therefore, in order to find the best modeling approach, we have experimented with traditional, neural network and ensemble learning methods. We observe that ensemble learning methods outperform traditional and neural network models based on standard regression metrics. However, by utilizing the concept of talent breakout, we are able to assess the true forecasting ability of the models, where neural networks outperform traditional and ensemble learning methods in terms of precision and recall.
- Asia > Japan (0.05)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
A Algorithms
We directly adopt the official default setting for Atari games. B.2 Minecraft Environment Settings Table 1 outlines how we set up and initialize the environment for each harvest task. Our method is tested in two different biomes: plains and sunflower plains. Both the plains and sunflower plains offer a wider field of view. In Minecraft, the action space is an 8-dimensional multi-discrete space.
STORI: A Benchmark and Taxonomy for Stochastic Environments
Barsainyan, Aryan Amit, Lim, Jing Yu, Liu, Dianbo
Reinforcement learning (RL) techniques have achieved impressive performance on simulated benchmarks such as Atari100k, yet recent advances remain largely confined to simulation and show limited transfer to real-world domains. A central obstacle is environmental stochasticity, as real systems involve noisy observations, unpredictable dynamics, and non-stationary conditions that undermine the stability of current methods. Existing benchmarks rarely capture these uncertainties and favor simplified settings where algorithms can be tuned to succeed. The absence of a well-defined taxonomy of stochasticity further complicates evaluation, as robustness to one type of stochastic perturbation, such as sticky actions, does not guarantee robustness to other forms of uncertainty. To address this critical gap, we introduce STORI (STOchastic-ataRI), a benchmark that systematically incorporates diverse stochastic effects and enables rigorous evaluation of RL techniques under different forms of uncertainty. We propose a comprehensive five-type taxonomy of environmental stochasticity and demonstrate systematic vulnerabilities in state-of-the-art model-based RL algorithms through targeted evaluation of DreamerV3 and STORM. Our findings reveal that world models dramatically underestimate environmental variance, struggle with action corruption, and exhibit unreliable dynamics under partial observability. We release the code and benchmark publicly at https://github.com/ARY2260/stori, providing a unified framework for developing more robust RL systems.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Learning Game-Playing Agents with Generative Code Optimization
Kuang, Zhiyi, Rong, Ryan, Yuan, YuCheng, Nie, Allen
We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > Canada (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Portugal > Braga > Braga (0.04)
Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning
Piccoli, Elia, Li, Malio, Carfì, Giacomo, Lomonaco, Vincenzo, Bacciu, Davide
The recent focus and release of pre-trained models have been a key components to several advancements in many fields (e.g. Natural Language Processing and Computer Vision), as a matter of fact, pre-trained models learn disparate latent embeddings sharing insightful representations. On the other hand, Reinforcement Learning (RL) focuses on maximizing the cumulative reward obtained via agent's interaction with the environment. RL agents do not have any prior knowledge about the world, and they either learn from scratch an end-to-end mapping between the observation and action spaces or, in more recent works, are paired with monolithic and computationally expensive Foundational Models. How to effectively combine and leverage the hidden information of different pre-trained models simultaneously in RL is still an open and understudied question. In this work, we propose Weight Sharing Attention (WSA), a new architecture to combine embeddings of multiple pre-trained models to shape an enriched state representation, balancing the tradeoff between efficiency and performance. We run an extensive comparison between several combination modes showing that WSA obtains comparable performance on multiple Atari games compared to end-to-end models. Furthermore, we study the generalization capabilities of this approach and analyze how scaling the number of models influences agents' performance during and after training.
- North America > Canada > Alberta (0.14)
- Europe > Austria > Vienna (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- (10 more...)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Education (1.00)
SwitchMT: An Adaptive Context Switching Methodology for Scalable Multi-Task Learning in Intelligent Autonomous Agents
Devkota, Avaneesh, Putra, Rachmad Vidya Wicaksana, Shafique, Muhammad
The ability to train intelligent autonomous agents (such as mobile robots) on multiple tasks is crucial for adapting to dynamic real-world environments. However, state-of-the-art reinforcement learning (RL) methods only excel in single-task settings, and still struggle to generalize across multiple tasks due to task interference. Moreover, real-world environments also demand the agents to have data stream processing capabilities. Toward this, a state-of-the-art work employs Spiking Neural Networks (SNNs) to improve multi-task learning by exploiting temporal information in data stream, while enabling lowpower/energy event-based operations. However, it relies on fixed context/task-switching intervals during its training, hence limiting the scalability and effectiveness of multi-task learning. To address these limitations, we propose SwitchMT, a novel adaptive task-switching methodology for RL-based multi-task learning in autonomous agents. Specifically, SwitchMT employs the following key ideas: (1) a Deep Spiking Q-Network with active dendrites and dueling structure, that utilizes task-specific context signals to create specialized sub-networks; and (2) an adaptive task-switching policy that leverages both rewards and internal dynamics of the network parameters. Experimental results demonstrate that SwitchMT achieves superior performance in multi-task learning compared to state-of-the-art methods. It achieves competitive scores in multiple Atari games (i.e., Pong: -8.8, Breakout: 5.6, and Enduro: 355.2) compared to the state-of-the-art, showing its better generalized learning capability. These results highlight the effectiveness of our SwitchMT methodology in addressing task interference while enabling multi-task learning automation through adaptive task switching, thereby paving the way for more efficient generalist agents with scalable multi-task learning capabilities.