Goto

Collaborating Authors

 Reinforcement Learning


GPLight+: A Genetic Programming Method for Learning Symmetric Traffic Signal Control Policy

arXiv.org Artificial Intelligence

--Recently, learning-based approaches, have achieved significant success in automatically devising effective traffic signal control strategies. In particular, as a powerful evolutionary machine learning approach, Genetic Programming (GP) is utilized to evolve human-understandable phase urgency functions to measure the urgency of activating a green light for a specific phase. However, current GP-based methods are unable to treat the common traffic features of different traffic signal phases consistently. T o address this issue, we propose to use a symmetric phase urgency function to calculate the phase urgency for a specific phase based on the current road conditions. This is represented as an aggregation of two shared subtrees, each representing the urgency of a turn movement in the phase. We then propose a GP method to evolve the symmetric phase urgency function. We evaluate our proposed method on the well-known cityflow traffic simulator, based on multiple public real-world datasets. The experimental results show that the proposed symmetric urgency function representation can significantly improve the performance of the learned traffic signal control policies over the traditional GP representation on a wide range of scenarios. Further analysis shows that the proposed method can evolve effective, human-understandable and easily deployable traffic signal control policies. RAFFIC signals, located at signalized intersections, manage traffic flow in various directions, thereby significantly contributing to the improvement of both transportation efficiency and road safety [1]. Poorly designed traffic signal plans result in commuters wasting valuable time on the roads. The majority of existing traffic signal control systems do not operate based on decisions tailored to the dynamic traffic conditions. For instance, the Sydney Coordinated Adaptive Traffic System [2], which relies on a predetermined cycle time plan, remains extensively utilized in real signalized intersections worldwide. The emergence of Deep Reinforcement Learning (DRL) as a solution to the Traffic Signal Control (TSC) problem is driven by advancements in deep learning [3] and the increasing accessibility of transportation infrastructure components such as surveillance cameras, road sensors, and the internet of vehicles [4]. This trend is exemplified by recent research efforts [5]-[7].


A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning

arXiv.org Artificial Intelligence

Multi-turn problem solving is critical yet challenging for Large Reasoning Models (LRMs) to reflect on their reasoning and revise from feedback. Existing Reinforcement Learning (RL) methods train large reasoning models on a single-turn paradigm with verifiable rewards. However, we observe that models trained with existing RL paradigms often lose their ability to solve problems across multiple turns and struggle to revise answers based on contextual feedback, leading to repetitive responses. We ask: can LRMs learn to reflect their answers in a multi-turn context? In this work, we find that training models with multi-turn RL using only unary feedback (e.g., "Let's try again") after wrong answers can improve both single-turn performance and multi-turn reasoning. We introduce Unary Feedback as Observation (UFO) for reinforcement learning, which uses minimal yet common unary user feedback during iterative problem solving. It can be easily applied to existing single-turn RL training setups. Experimental results show that RL training with UFO keeps single-turn performance and improves multi-turn reasoning accuracy by up to 14%, enabling language models to better react to feedback in multi-turn problem solving. To further minimize the number of turns needed for a correct answer while encouraging diverse reasoning when mistakes occur, we design reward structures that guide models to produce careful and deliberate answers in each turn. Code: https://github.com/lichengliu03/unary-feedback