AITopics | q-network

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

Neural Information Processing SystemsApr-25-2026, 14:23:46 GMT

Bayesian optimization (BO) conventionally relies on handcrafted acquisition functions (AFs) to sequentially determine the sample points. However, it has been widely observed in practice that the best-performing AF in terms of regret can vary significantly under different types of black-box functions. It has remained a challenge to design one AF that can attain the best performance over a wide variety of black-box functions. This paper aims to attack this challenge through the perspective of reinforced few-shot AF learning (FSAF). Specifically, we first connect the notion of AFs with Q-functions and view a deep Q-network (DQN) as a surrogate differentiable AF. While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy. To address this, we present a Bayesian variant of DQN with the following three features: (i) It learns a distribution of Q-networks as AFs based on the Kullback-Leibler regularization framework. This inherently provides the uncertainty required in sampling for BO and mitigates overfitting.

machine learning, optimization, reinforcement learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
(2 more...)

Add feedback

n B(s B(puttput

Neural Information Processing SystemsFeb-10-2026, 07:04:29 GMT

Once estimates, viagradient error: = r (Q (s, a) target) 2. Ourfocus Q-networks 2 Where Do Learned Fourier Features Help?

artificial intelligence, fourierfeature, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

Neural Information Processing SystemsFeb-8-2026, 08:45:35 GMT

Bayesian optimization (BO) conventionally relies on handcrafted acquisition functions (AFs) to sequentially determine the sample points. However, it has been widely observed in practice that the best-performing AF in terms of regret can vary significantly under different types of black-box functions. It has remained a challenge to design one AF that can attain the best performance over a wide variety of black-box functions.

machine learning, optimization, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report (0.67)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
(2 more...)

Add feedback

3d3d286a8d153a4a58156d0e02d8570c-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 07:44:23 GMT

dataset, edac, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

Neural Information Processing SystemsDec-24-2025, 00:51:35 GMT

Bayesian optimization (BO) conventionally relies on handcrafted acquisition functions (AFs) to sequentially determine the sample points. However, it has been widely observed in practice that the best-performing AF in terms of regret can vary significantly under different types of black-box functions. It has remained a challenge to design one AF that can attain the best performance over a wide variety of black-box functions. This paper aims to attack this challenge through the perspective of reinforced few-shot AF learning (FSAF). Specifically, we first connect the notion of AFs with Q-functions and view a deep Q-network (DQN) as a surrogate differentiable AF. While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy. To address this, we present a Bayesian variant of DQN with the following three features: (i) It learns a distribution of Q-networks as AFs based on the Kullback-Leibler regularization framework. This inherently provides the uncertainty required in sampling for BO and mitigates overfitting.

bayesian optimization, name change, reinforced few-shot acquisition function learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Cooperative Transmission Design for Ultra-Reliable Low-Latency Communications via Deep Reinforcement Learning

Yu, Hyemin, Yang, Hong-Chuan

arXiv.org Artificial IntelligenceNov-5-2025

Next-generation wireless communication systems must support ultra-reliable low-latency communication (URLLC) service for mission-critical applications. Meeting stringent URLLC requirements is challenging, especially for two-hop cooperative communication. In this paper, we develop an adaptive transmission design for a two-hop relaying communication system. Each hop transmission adaptively configures its transmission parameters separately, including numerology, mini-slot size, and modulation and coding scheme, for reliable packet transmission within a strict latency constraint. We formulate the hop-specific transceiver configuration as a Markov decision process (MDP) and propose a dual-agent reinforcement learning-based cooperative latency-aware transmission (DRL-CoLA) algorithm to learn latency-aware transmission policies in a distributed manner. Simulation results verify that the proposed algorithm achieves the near-optimal reliability while satisfying strict latency requirements.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2511.02216

Country: North America > Canada (0.14)

Genre: Research Report (0.82)

Industry: Telecommunications (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Stop-RAG: Value-Based Retrieval Control for Iterative RAG

Park, Jaewan, Cho, Solbee, Lee, Jay-Yoon

arXiv.org Artificial IntelligenceOct-17-2025

Iterative retrieval-augmented generation (RAG) enables large language models to answer complex multi-hop questions, but each additional loop increases latency, costs, and the risk of introducing distracting evidence, motivating the need for an efficient stopping strategy. Existing methods either use a predetermined number of iterations or rely on confidence proxies that poorly reflect whether more retrieval will actually help. We cast iterative RAG as a finite-horizon Markov decision process and introduce Stop-RAG, a value-based controller that adaptively decides when to stop retrieving. Trained with full-width forward-view Q($λ$) targets from complete trajectories, Stop-RAG learns effective stopping policies while remaining compatible with black-box APIs and existing pipelines. On multi-hop question-answering benchmarks, Stop-RAG consistently outperforms both fixed-iteration baselines and prompting-based stopping with LLMs. These results highlight adaptive stopping as a key missing component in current agentic systems, and demonstrate that value-based control can improve the accuracy of RAG systems.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.14337

Country:

North America > United States (0.67)
Europe (0.67)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

f600d1a3f6a63f782680031f3ce241a7-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 20:20:06 GMT

artificial intelligence, experiment, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

sin (2πB() cos (2πB()! y! Input Output

Neural Information Processing SystemsAug-16-2025, 11:01:28 GMT

These functions map points in the state or state-action space to expected returns, and provide crucial information that is used for improving the policy.

fourier feature, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Deep Reinforcement Learning-based Cell DTX/DRX Configuration for Network Energy Saving

Mao, Wei, Wei, Lili, Semiari, Omid, Yeh, Shu-ping, Nikopour, Hosein

arXiv.org Artificial IntelligenceJul-30-2025

3GPP Release 18 cell discontinuous transmission and reception (cell DTX/DRX) is an important new network energy saving feature for 5G. As a time-domain technique, it periodically aggregates the user data transmissions in a given duration of time when the traffic load is not heavy, so that the remaining time can be kept silent and advanced sleep modes (ASM) can be enabled to shut down more radio components and save more energy for the cell. However, inevitably the packet delay is increased, as during the silent period no transmission is allowed. In this paper we study how to configure cell DTX/DRX to optimally balance energy saving and packet delay, so that for delay-sensitive traffic maximum energy saving can be achieved while the degradation of quality of service (QoS) is minimized. As the optimal configuration can be different for different network and traffic conditions, the problem is complex and we resort to deep reinforcement learning (DRL) framework to train an AI agent to solve it. Through careful design of 1) the learning algorithm, which implements a deep Q-network (DQN) on a contextual bandit (CB) model, and 2) the reward function, which utilizes a smooth approximation of a theoretically optimal but discontinuous reward function, we are able to train an AI agent that always tries to select the best possible Cell DTX/DRX configuration under any network and traffic conditions. Simulation results show that compared to the case when cell DTX/DRX is not used, our agent can achieve up to ~45% energy saving depending on the traffic load scenario, while always maintaining no more than ~1% QoS degradation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2507.21385

Genre: Research Report > New Finding (0.34)

Industry: