AITopics | Jiechao Xiong

Exponentially Weighted Imitation Learning for Batched Historical Data

Qing Wang, Jiechao Xiong, Lei Han, peng sun, Han Liu, Tong Zhang

Neural Information Processing SystemsMay-26-2025, 06:13:32 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Divergence-Augmented Policy Optimization

Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang

Neural Information Processing SystemsMar-26-2025, 22:14:41 GMT

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America (0.68)
Asia > China (0.47)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exponentially Weighted Imitation Learning for Batched Historical Data

Qing Wang, Jiechao Xiong, Lei Han, peng sun, Han Liu, Tong Zhang

Neural Information Processing SystemsMar-25-2025, 15:56:17 GMT

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or "environment oracle" as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Games > Computer Games (0.46)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Divergence-Augmented Policy Optimization

Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang

Neural Information Processing SystemsJan-27-2025, 02:21:39 GMT

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America (0.68)
Asia > China > Guangdong Province (0.14)

Industry: Leisure & Entertainment > Games > Computer Games (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Split LBI: An Iterative Regularization Path with Structural Sparsity

Chendi Huang, Xinwei Sun, Jiechao Xiong, Yuan Yao

Neural Information Processing SystemsJan-20-2025, 07:53:39 GMT

An iterative regularization path with structural sparsity is proposed in this paper based on variable splitting and the Linearized Bregman Iteration, hence called Split LBI. Despite its simplicity, Split LBI outperforms the popular generalized Lasso in both theory and experiments. A theory of path consistency is presented that equipped with a proper early stopping, Split LBI may achieve model selection consistency under a family of Irrepresentable Conditions which can be weaker than the necessary and sufficient condition for generalized Lasso.

artificial intelligence, consistency, machine learning, (15 more...)

Neural Information Processing Systems

Country: