AITopics | pqn

Collaborating Authors

pqn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling

Park, Jongchan

arXiv.org Machine LearningMay-22-2026

Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the inherent non-stationarity of the data distribution. We challenge this view by observing that non-stationarity is not a fixed property of RL, but evolves throughout training: early stages exhibit rapid behavioral shifts that demand small batches for plasticity, whereas late stages approach a quasi-stationary regime where large batches enable precise convergence. Motivated by this observation, we propose Adaptive Batch Scaling (ABS), that dynamically adjusts the effective batch size according to the stability of the learning policy. Central to ABS is Behavioral Divergence, a novel metric that quantifies policy non-stationarity by measuring action-level shifts between consecutive updates, which we use to scale batch size inversely to policy volatility. Integrated with the Parallelised Q-Network (PQN) algorithm and evaluated on the ALE benchmark, ABS seamlessly reconciles early-stage plasticity with late-stage stable convergence. Strikingly, contrary to conventional wisdom, our results reveal that the combination of larger networks and larger batch sizes achieves the best performance - a scaling behavior previously thought to be unattainable in RL, now unlocked through adaptive batch control.

artificial intelligence, machine learning, natural language, (11 more...)

arXiv.org Machine Learning

2605.21557

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hadamax Encoding: Elevating Performance in Model-Free Atari

Kooi, Jacob E., Yang, Zhao, François-Lavet, Vincent

arXiv.org Artificial IntelligenceMay-26-2025

Neural network architectures have a large impact in machine learning. In reinforcement learning, network architectures have remained notably simple, as changes often lead to small gains in performance. This work introduces a novel encoder architecture for pixel-based model-free reinforcement learning. The Hadamax (\textbf{Hada}mard \textbf{max}-pooling) encoder achieves state-of-the-art performance by max-pooling Hadamard products between GELU-activated parallel hidden layers. Based on the recent PQN algorithm, the Hadamax encoder achieves state-of-the-art model-free performance in the Atari-57 benchmark. Specifically, without applying any algorithmic hyperparameter modifications, Hadamax-PQN achieves an 80\% performance gain over vanilla PQN and significantly surpasses Rainbow-DQN. For reproducibility, the full code is available on \href{https://github.com/Jacobkooi/Hadamax}{GitHub}.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.15345

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training

Ahn, Myeonghwan, Yoo, Sungjoo

arXiv.org Artificial IntelligenceMay-19-2025

Ever-growing scale of large language models (LLMs) is pushing for improved efficiency, favoring fully quantized training (FQT) over BF16. While FQT accelerates training, it faces consistency challenges and requires searching over an exponential number of cases, each needing over 200B tokens to ensure stability. Pseudo-quantization training (PQT) addresses the issues of FQT, although it is not well-studied. We explore the practical implications of PQT in detail and propose a noise distribution $R$ that is floating-point (FP)-friendly, with ideal properties including stochastic precision annealing. As a result, the proposed method serves as an effective theoretical foundation for low-precision FP parameters through PQT, utilizing efficient fake quantization via an addition and subsequent FP casting. We demonstrate that Gaussian weight sampling is (1) scalable: supports low-precision FP parameters down to FP6 and high-precision noise up to 9-bit with BF16 operator. The proposed method is (2) efficient: incurring computational overhead as low as 1.40\% on the A100 GPU in terms of Llama2 training tokens per second, and requiring 2 bytes per parameter in GPU memory. We demonstrate that PQT with Gaussian weight sampling is (3) stable: closely following or even surpassing performance of the BF16 baseline while pre-training GPT2 and Llama2 models with up to 1B parameters and 300B tokens.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.1117

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Simplifying Deep Temporal Difference Learning

Gallici, Matteo, Fellows, Mattie, Ellis, Benjamin, Pou, Bartomeu, Masmitja, Ivan, Foerster, Jakob Nicolaus, Martin, Mario

arXiv.org Artificial IntelligenceJul-5-2024

Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target network harms the sample efficiency and, similarly, the replay buffer introduces memory and implementation overheads. In this paper, we investigate whether it is possible to accelerate and simplify TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms without the need for a target network, even with off-policy data. Empirically, we find that online, parallelised sampling enabled by vectorised environments stabilises training without the need of a replay buffer. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm. Surprisingly, this simple algorithm is competitive with more complex methods like: Rainbow in Atari, R2D2 in Hanabi, QMix in Smax, PPO-RNN in Craftax, and can be up to 50x faster than traditional DQN without sacrificing sample efficiency. In an era where PPO has become the go-to RL algorithm, PQN reestablishes Q-learning as a viable alternative. We make our code available at: https://github.com/mttga/purejaxql.

algorithm, international conference, pqn, (14 more...)

arXiv.org Artificial Intelligence

2407.04811

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Leisure & Entertainment > Games (0.93)
Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Pointer Networks with Q-Learning for OP Combinatorial Optimization

Barro, Alessandro

arXiv.org Artificial IntelligenceNov-5-2023

The Orienteering Problem (OP) presents a unique challenge in combinatorial optimization, emphasized by its widespread use in logistics, delivery, and transportation planning. Given the NP-hard nature of OP, obtaining optimal solutions is inherently complex. While Pointer Networks (Ptr-Nets) have exhibited prowess in various combinatorial tasks, their performance in the context of OP leaves room for improvement. Recognizing the potency of Q-learning, especially when paired with deep neural structures, this research unveils the Pointer Q-Network (PQN). This innovative method combines Ptr-Nets and Q-learning, effectively addressing the specific challenges presented by OP. We deeply explore the architecture and efficiency of PQN, showcasing its superior capability in managing OP situations.

pqn, ptr-net, q-value, (14 more...)

arXiv.org Artificial Intelligence

2311.02629

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Particle algorithms for maximum likelihood training of latent variable models

Kuntz, Juan, Lim, Jen Ning, Johansen, Adam M.

arXiv.org Artificial IntelligenceFeb-18-2023

(Neal and Hinton, 1998) recast maximum likelihood estimation of any given latent variable model as the minimization of a free energy functional $F$, and the EM algorithm as coordinate descent applied to $F$. Here, we explore alternative ways to optimize the functional. In particular, we identify various gradient flows associated with $F$ and show that their limits coincide with $F$'s stationary points. By discretizing the flows, we obtain practical particle-based algorithms for maximum likelihood estimation in broad classes of latent variable models. The novel algorithms scale to high-dimensional settings and perform well in numerical experiments.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2204.12965

Country:

Europe (0.67)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.92)
Energy > Oil & Gas > Upstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback