Goto

Collaborating Authors

 drl


Learning to Shape In-distribution Feature Space for Out-of-distribution Detection

Neural Information Processing Systems

Out-of-distribution (OOD) detection is critical for deploying machine learning models in the open world. To design scoring functions that discern OOD data from the in-distribution (ID) cases from a pre-trained discriminative model, existing methods tend to make rigorous distributional assumptions either explicitly or implicitly due to the lack of knowledge about the learned feature space in advance. The mismatch between the learned and assumed distributions motivates us to raise a fundamental yet under-explored question: \textit{Is it possible to deterministically model the feature distribution while pre-training a discriminative model?}This paper gives an affirmative answer to this question by presenting a Distributional Representation Learning (\texttt{DRL}) framework for OOD detection. In particular, \texttt{DRL} explicitly enforces the underlying feature space to conform to a pre-defined mixture distribution, together with an online approximation of normalization constants to enable end-to-end training. Furthermore, we formulate \texttt{DRL} into a provably convergent Expectation-Maximization algorithm to avoid trivial solutions and rearrange the sequential sampling to guide the training consistency. Extensive evaluations across mainstream OOD detection benchmarks empirically manifest the superiority of the proposed \texttt{DRL} over its advanced counterparts.


An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

Neural Information Processing Systems

Deep reinforcement learning (DRL) algorithms and evolution strategies (ES) have been applied to various tasks, showing excellent performances. These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa. Recently, there have been attempts to combine these algorithms, but these methods fully rely on synchronous update scheme, making it not ideal to maximize the benefits of the parallelism in ES. To solve this challenge, asynchronous update scheme was introduced, which is capable of good time-efficiency and diverse policy exploration. In this paper, we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL, which are exploration and time efficiency, stability, and sample efficiency, respectively. The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods.




reviewers, that we will make an implementation of our work available upon publication

Neural Information Processing Systems

We are glad that our reviewers agree on the merits and relevance of our work. R3/R4: Applying Freeze-Thaw BO in the settings considered. See Figure 1 for further illustration of why FT struggles in DRL settings. Fabolas uses a different way of obtaining low-fidelity information. R3: Sec 3.2 and 3.3 should be reversed as Sec 3.2 makes reference to Eq (7).


On The Presence of Double-Descent in Deep Reinforcement Learning

Veselý, Viktor, Todorov, Aleksandar, Sabatelli, Matthia

arXiv.org Machine Learning

The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL). We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework. We rely on an information-theoretic metric, Policy Entropy, to measure policy uncertainty throughout training. Preliminary results show a clear epoch-wise DD curve; the policy's entrance into the second descent region correlates with a sustained, significant reduction in Policy Entropy. This entropic decay suggests that over-parameterization acts as an implicit regularizer, guiding the policy towards robust, flatter minima in the loss landscape. These findings establish DD as a factor in DRL and provide an information-based mechanism for designing agents that are more general, transferable, and robust.


Improved Robustness of Deep Reinforcement Learning for Control of Time-Varying Systems by Bounded Extremum Seeking

Saxena, Shaifalee, Williams, Alan, Fierro, Rafael, Scheinker, Alexander

arXiv.org Artificial Intelligence

In this paper, we study the use of robust model independent bounded extremum seeking (ES) feedback control to improve the robustness of deep reinforcement learning (DRL) controllers for a class of nonlinear time-varying systems. DRL has the potential to learn from large datasets to quickly control or optimize the outputs of many-parameter systems, but its performance degrades catastrophically when the system model changes rapidly over time. Bounded ES can handle time-varying systems with unknown control directions, but its convergence speed slows down as the number of tuned parameters increases and, like all local adaptive methods, it can get stuck in local minima. We demonstrate that together, DRL and bounded ES result in a hybrid controller whose performance exceeds the sum of its parts with DRL taking advantage of historical data to learn how to quickly control a many-parameter system to a desired setpoint while bounded ES ensures its robustness to time variations. We present a numerical study of a general time-varying system and a combined ES-DRL controller for automatic tuning of the Low Energy Beam Transport section at the Los Alamos Neutron Science Center linear particle accelerator.


reviewers, that we will make an implementation of our work available upon publication

Neural Information Processing Systems

We are glad that our reviewers agree on the merits and relevance of our work. R3/R4: Applying Freeze-Thaw BO in the settings considered. See Figure 1 for further illustration of why FT struggles in DRL settings. Fabolas uses a different way of obtaining low-fidelity information. R3: Sec 3.2 and 3.3 should be reversed as Sec 3.2 makes reference to Eq (7).



The 1st International Workshop on Disentangled Representation Learning for Controllable Generation (DRL4Real): Methods and Results

Chen, Qiuyu, Jin, Xin, Song, Yue, Liu, Xihui, Yang, Shuai, Yang, Tao, Li, Ziqiang, Huang, Jianguo, Wei, Yuntao, Xie, Ba'ao, Sebe, Nicu, Wenjun, null, Zeng, null, Yun, Jooyeol, Abati, Davide, Omran, Mohamed, Choo, Jaegul, Habibian, Amir, Wiggers, Auke, Kobayashi, Masato, Ding, Ning, Tamaki, Toru, Gheisari, Marzieh, Genovesio, Auguste, Chen, Yuheng, Liu, Dingkun, Yang, Xinyao, Xu, Xinping, Chen, Baicheng, Wu, Dongrui, Geng, Junhao, Lv, Lexiang, Lin, Jianxin, Liang, Hanzhe, Zhou, Jie, Chen, Xuanxin, Wang, Jinbao, Gao, Can, Wang, Zhangyi, Li, Zongze, Wen, Bihan, Gao, Yixin, Pan, Xiaohan, Li, Xin, Chen, Zhibo, Peng, Baorui, Chen, Zhongming, Jin, Haoran

arXiv.org Artificial Intelligence

This paper reviews the 1st International Workshop on Disentangled Representation Learning for Controllable Generation (DRL4Real), held in conjunction with ICCV 2025. The workshop aimed to bridge the gap between the theoretical promise of Disentangled Representation Learning (DRL) and its application in realistic scenarios, moving beyond synthetic benchmarks. DRL4Real focused on evaluating DRL methods in practical applications such as controllable generation, exploring advancements in model robustness, interpretability, and generalization. The workshop accepted 9 papers covering a broad range of topics, including the integration of novel inductive biases (e.g., language), the application of diffusion models to DRL, 3D-aware disentanglement, and the expansion of DRL into specialized domains like autonomous driving and EEG analysis. This summary details the workshop's objectives, the themes of the accepted papers, and provides an overview of the methodologies proposed by the authors.