AITopics

2509.06759

Country: Oceania > Australia (0.29)

Genre:

Overview (0.66)
Research Report (0.40)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Deep Reinforcement Learning for Ranking Utility Tuning in the Ad Recommender System at Pinterest

Yang, Xiao, Ayed, Mehdi Ben, Zhao, Longyu, Zhou, Fan, Shen, Yuchen, Engle, Abe, Zhuang, Jinfeng, Leng, Ling, Xu, Jiajing, Rosenberg, Charles, Deshikachar, Prathibha

The ranking utility function in an ad recommender system, which linearly combines predictions of various business goals, plays a central role in balancing values across the platform, advertisers, and users. Traditional manual tuning, while offering simplicity and interpretability, often yields suboptimal results due to its unprincipled tuning objectives, the vast amount of parameter combinations, and its lack of personalization and adaptability to seasonality. In this work, we propose a general Deep Reinforcement Learning framework for Personalized Utility Tuning (DRL-PUT) to address the challenges of multi-objective optimization within ad recommender systems. Our key contributions include: 1) Formulating the problem as a reinforcement learning task: given the state of an ad request, we predict the optimal hyperparameters to maximize a pre-defined reward. 2) Developing an approach to directly learn an optimal policy model using online serving logs, avoiding the need to estimate a value function, which is inherently challenging due to the high variance and unbalanced distribution of immediate rewards. We evaluated DRL-PUT through an online A/B experiment in Pinterest's ad recommender system. Compared to the baseline manual utility tuning approach, DRL-PUT improved the click-through rate by 9.7% and the long click-through rate by 7.7% on the treated segment. We conducted a detailed ablation study on the impact of different reward definitions and analyzed the personalization aspect of the learned policy model.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2509.05292

Country:

North America > United States > California > San Francisco County > San Francisco (0.16)
North America > United States > New York > New York County > New York City (0.15)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Gardner, Jason, Dutta, Ayan, Roy, Swapnoneel, Kreidl, O. Patrick, Boloni, Ladislau

Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks

The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models. While algorithmic efficiency in terms of learning performance has been extensively studied, the energy requirements, greenhouse gas emissions, and monetary costs of DRL algorithms remain largely unexplored. In this work, we present a systematic benchmarking study of the energy consumption of seven state-of-the-art DRL algorithms, namely DQN, TRPO, A2C, ARS, PPO, RecurrentPPO, and QR-DQN, implemented using Stable Baselines. Each algorithm was trained for one million steps each on ten Atari 2600 games, and power consumption was measured in real-time to estimate total energy usage, CO2-Equivalent emissions, and electricity cost based on the U.S. national average electricity price. Our results reveal substantial variation in energy efficiency and training cost across algorithms, with some achieving comparable performance while consuming up to 24% less energy (ARS vs. DQN), emitting nearly 68% less CO2, and incurring almost 68% lower monetary cost (QR-DQN vs. RecurrentPPO) than less efficient counterparts. We further analyze the trade-offs between learning performance, training time, energy use, and financial cost, highlighting cases where algorithmic choices can mitigate environmental and economic impact without sacrificing learning performance. This study provides actionable insights for developing energy-aware and cost-efficient DRL practices and establishes a foundation for incorporating sustainability considerations into future algorithmic design and evaluation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2509.05273

Country: North America > United States > Florida (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Power Industry (1.00)
Leisure & Entertainment > Games > Computer Games (0.69)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Thomas, Aaron Mark, Chen, Yu-Cheng, Valencia, Hubert Okadome, Jose, Sharu Theresa, Wu, Ronin

QCA-MolGAN: Quantum Circuit Associative Molecular GAN with Multi-Agent Reinforcement Learning

Navigating the vast chemical space of molecular structures to design novel drug molecules with desired target properties remains a central challenge in drug discovery. Recent advances in generative models offer promising solutions. This work presents a novel quantum circuit Born machine (QCBM)-enabled Generative Adversarial Network (GAN), called QCA-MolGAN, for generating drug-like molecules. The QCBM serves as a learnable prior distribution, which is associatively trained to define a latent space aligning with high-level features captured by the GANs discriminator. Additionally, we integrate a novel multi-agent reinforcement learning network to guide molecular generation with desired targeted properties, optimising key metrics such as quantitative estimate of drug-likeness (QED), octanol-water partition coefficient (LogP) and synthetic accessibility (SA) scores in conjunction with one another. Experimental results demonstrate that our approach enhances the property alignment of generated molecules with the multi-agent reinforcement learning agents effectively balancing chemical properties.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2509.05051

Country:

Europe (1.00)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Caissutti, Cristiano, Gerbier, Estelle, Khorrambakht, Ehsan, Marinelli, Paolo, Munafo', Andrea, Caiti, Andrea

Shared Autonomy through LLMs and Reinforcement Learning for Applications to Ship Hull Inspections

--Shared autonomy is a promising paradigm in robotic systems, particularly within the maritime domain, where complex, high-risk, and uncertain environments necessitate effective human-robot collaboration. This paper investigates the interaction of three complementary approaches to advance shared autonomy in heterogeneous marine robotic fleets: (i) the integration of Large Language Models (LLMs) to facilitate intuitive high-level task specification and support hull inspection missions, (ii) the implementation of human-in-the-loop interaction frameworks in multi-agent settings to enable adaptive and intent-aware coordination, and (iii) the development of a modular Mission Manager based on Behavior Trees to provide interpretable and flexible mission control. Preliminary results from simulation and real-world lake-like environments demonstrate the potential of this multi-layered architecture to reduce operator cognitive load, enhance transparency, and improve adaptive behaviour alignment with human intent. Ongoing work focuses on fully integrating these components, refining coordination mechanisms, and validating the system in operational port scenarios. This study contributes to establishing a modular and scalable foundation for trustworthy, human-collaborative autonomy in safety-critical maritime robotics applications.

large language model, machine learning, reinforcement learning, (17 more...)

2509.05042

Country: Europe > Italy (0.20)

Genre: Research Report (1.00)

Industry:

Government (0.70)
Transportation > Marine (0.65)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Topology-Aware Graph Reinforcement Learning for Dynamic Routing in Cloud Networks

Wang, Yuxi, Liu, Heyao, Yao, Guanzi, Long, Nyutian, Kang, Yue

artificial intelligence, machine learning, topology-aware graph reinforcement learning, (2 more...)

This paper proposes a topology-aware graph reinforcement learning approach to address the routing policy optimization problem in cloud server environments. The method builds a unified framework for state representation and structural evolution by integrating a Structure-Aware State Encoding (SASE) module and a Policy-Adaptive Graph Update (PAGU) mechanism. It aims to tackle the challenges of decision instability and insufficient structural awareness under dynamic topologies. The SASE module models node states through multi-layer graph convolution and structural positional embeddings, capturing high-order dependencies in the communication topology and enhancing the expressiveness of state representations. The PAGU module adjusts the graph structure based on policy behavior shifts and reward feedback, enabling adaptive structural updates in dynamic environments. Experiments are conducted on the real-world GEANT topology dataset, where the model is systematically evaluated against several representative baselines in terms of throughput, latency control, and link balance. Additional experiments, including hyperparameter sensitivity, graph sparsity perturbation, and node feature dimensionality variation, further explore the impact of structure modeling and graph updates on model stability and decision quality. Results show that the proposed method outperforms existing graph reinforcement learning models across multiple performance metrics, achieving efficient and robust routing in dynamic and complex cloud networks.

2509.04973

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Jang, Wonseo, Kim, Dongjae

An Arbitration Control for an Ensemble of Diversified DQN variants in Continual Reinforcement Learning

Deep reinforcement learning (RL) models, despite their efficiency in learning an optimal policy in static environments, easily loses previously learned knowledge (i.e., catastrophic forgetting). It leads RL models to poor performance in continual reinforcement learning (CRL) scenarios. To address this, we present an arbitration control mechanism over an ensemble of RL agents. It is motivated by and closely aligned with how humans make decisions in a CRL context using an arbitration control of multiple RL agents in parallel as observed in the prefrontal cortex. We integrated two key ideas into our model: (1) an ensemble of RLs (i.e., DQN variants) explicitly trained to have diverse value functions and (2) an arbitration control that prioritizes agents with higher reliability (i.e., less error) in recent trials. We propose a framework for CRL, an A rbitration C ontrol for an E nsemble of D iversified DQN variants ( ACED-DQN). We demonstrate significant performance improvements in both static and continual environments, supported by empirical evidence showing the effectiveness of arbitration control over diversified DQNs during training. In this work, we introduced a framework that enables RL agents to continuously learn, with inspiration from the human brain.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2509.04815

Genre: Research Report (1.00)

Industry:

Law > Alternative Dispute Resolution (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhang, Zhihao, Peng, Chengyang, Yurtsever, Ekim, Redmill, Keith A.

Bootstrapping Reinforcement Learning with Sub-optimal Policies for Autonomous Driving

Automated vehicle control using reinforcement learning (RL) has attracted significant attention due to its potential to learn driving policies through environment interaction. However, RL agents often face training challenges in sample efficiency and effective exploration, making it difficult to discover an optimal driving strategy. To address these issues, we propose guiding the RL driving agent with a demonstration policy that need not be a highly optimized or expert-level controller. Specifically, we integrate a rule-based lane change controller with the Soft Actor Critic (SAC) algorithm to enhance exploration and learning efficiency. Our approach demonstrates improved driving performance and can be extended to other driving scenarios that can similarly benefit from demonstration-based guidance.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2509.04712

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Yoo, Minjong, Kim, Woo Kyung, Woo, Honguk

In-Context Policy Adaptation via Cross-Domain Skill Diffusion

In this work, we present an in-context policy adaptation (ICPAD) framework designed for long-horizon multi-task environments, exploring diffusion-based skill learning techniques in cross-domain settings. The framework enables rapid adaptation of skill-based reinforcement learning policies to diverse target domains, especially under stringent constraints on no model updates and only limited target domain data. Specifically, the framework employs a cross-domain skill diffusion scheme, where domain-agnostic prototype skills and a domain-grounded skill adapter are learned jointly and effectively from an offline dataset through cross-domain consistent diffusion processes. The prototype skills act as primitives for common behavior representations of long-horizon policies, serving as a lingua franca to bridge different domains. Furthermore, to enhance the in-context adaptation performance, we develop a dynamic domain prompting scheme that guides the diffusion-based skill adapter toward better alignment with the target domain. Through experiments with robotic manipulation in Metaworld and autonomous driving in CARLA, we show that our $\oursol$ framework achieves superior policy adaptation performance under limited target domain data conditions for various cross-domain configurations including differences in environment dynamics, agent embodiment, and task horizon.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2509.04535

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

Pham, Tien, Chi, Xinyun, Nguyen, Khang, Huber, Manfred, Cangelosi, Angelo

Reinforcement learning (RL) agents can learn to solve complex tasks from visual inputs, but generalizing these learned skills to new environments remains a major challenge in RL application, especially robotics. While data augmentation can improve generalization, it often compromises sample efficiency and training stability. This paper introduces DeGuV, an RL framework that enhances both generalization and sample efficiency. In specific, we leverage a learnable masker network that produces a mask from the depth input, preserving only critical visual information while discarding irrelevant pixels. Through this, we ensure that our RL agents focus on essential features, improving robustness under data augmentation. In addition, we incorporate contrastive learning and stabilize Q-value estimation under augmentation to further enhance sample efficiency and training stability. We evaluate our proposed method on the RL-ViGen benchmark using the Franka Emika robot and demonstrate its effectiveness in zero-shot sim-to-real transfer. Our results show that DeGuV outperforms state-of-the-art methods in both generalization and sample efficiency while also improving interpretability by highlighting the most relevant regions in the visual input

artificial intelligence, machine learning, reinforcement learning, (11 more...)

2509.0497

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)