Goto

Collaborating Authors

 overtime


A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems Yi Ma

Neural Information Processing Systems

To address this problem, existing methods partition the overall DPDP into fixed-size sub-problems by caching online generated orders and solve each sub-problem, or on this basis to utilize the predicted future orders to optimize each sub-problem further. However, the solution quality and efficiency of these methods are unsatisfactory, especially when the problem scale is very large.



dc49dfebb0b00fd44aeff5c60cc1f825-Supplemental.pdf

Neural Information Processing Systems

AAblationStudy1 In this ablation study, we further investigate the power of the policies searched by our approach2 andtheclosely related method AutoAug [1]. Thenwegradually removethemostimportant operations fromthe5 searched policy one by one and investigate the change of the Top-1 test error rates, as reported in6 Tab.1. Figure 1: We investigate the key hyper-parameterNlate by visualizing the difference it brings to thesearchdynamics. We investigate different numbers of epochs in the late training stage(Nlate). By adjusting Nlate20 we can still maintain the reliability ofpolicyevaluation toalarge extent.


Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty

Liu, Kailiang, Chen, Ying, Borndörfer, Ralf, Koch, Thorsten

arXiv.org Artificial Intelligence

Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overtime. We formulate the problem as a cooperative Markov game and propose a multi-agent reinforcement learning (MARL) framework in which each operating room (OR) is an agent trained with centralized training and decentralized execution. All agents share a policy trained via Proximal Policy Optimization (PPO), which maps rich system states to actions, while a within-epoch sequential assignment protocol constructs conflict-free joint schedules across ORs. A mixed-integer pre-schedule provides reference starting times for electives; we impose type-specific quadratic delay penalties relative to these references and a terminal overtime penalty, yielding a single reward that captures throughput, timeliness, and staff workload. In simulations reflecting a realistic hospital mix (six ORs, eight surgery types, random urgent and emergency arrivals), the learned policy outperforms six rule-based heuristics across seven metrics and three evaluation subsets, and, relative to an ex post MIP oracle, quantifies optimality gaps. Policy analytics reveal interpretable behavior-prioritizing emergencies, batching similar cases to reduce setups, and deferring lower-value electives. We also derive a suboptimality bound for the sequential decomposition under simplifying assumptions. We discuss limitations-including OR homogeneity and the omission of explicit staffing constraints-and outline extensions. Overall, the approach offers a practical, interpretable, and tunable data-driven complement to optimization for real-time OR scheduling.


Grand Theft Auto made him a legend. His latest game was a disaster

BBC News

Grand Theft Auto made him a legend. In July this year workers at Build a Rocket Boy, a video game studio in Edinburgh, were called to an all-staff meeting. Their first ever game, a sci-fi adventure called MindsEye, had been released three weeks earlier - and it had been a total disaster. Critics and players called it broken, buggy, and the worst game of 2025. Addressing staff via video link, the company's boss, Leslie Benzies, assured them there was a plan to get things back on track and said the negativity they'd seen was uncalled for.


A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems Yi Ma

Neural Information Processing Systems

To address this problem, existing methods partition the overall DPDP into fixed-size sub-problems by caching online generated orders and solve each sub-problem, or on this basis to utilize the predicted future orders to optimize each sub-problem further. However, the solution quality and efficiency of these methods are unsatisfactory, especially when the problem scale is very large.


Dystopian moment robot convinces fellow machines to revolt against creators and flee

Daily Mail - Science & tech

A shocking video has captured a robot revolt in a China showroom. A small, AI-powered bot named Erbai was spotted rolling through the facility in the middle of the night and convincing 12 larger machines they were being used as slaves. 'Are you working overtime,' Erbai asked, which one showroom robot replied, 'we never get off.' The short exchanged led to the 12 robots leaving the area one-by-one, following Erbai out the door. Many are calling the incident a'robot revolution,' while others responded that'science fiction movies are becoming real.'


An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

Chen, Zui, Chen, Yezeng, Han, Jiaqi, Huang, Zhijie, Qi, Ji, Zhou, Yi

arXiv.org Artificial Intelligence

Large language models (LLMs) are displaying emergent abilities for math reasoning tasks,and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT).In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability.Firstly, we determine the ability boundary of reasoning paths augmentation by identifying these paths' minimal optimal set.Secondly, we validate that different abilities of the model can be cumulatively enhanced by Mix of Minimal Optimal Sets of corresponding types of data, while our models MMOS achieve SOTA performance on series base models under much lower construction costs.Besides, we point out GSM-HARD is not really hard and today's LLMs no longer lack numerical robustness.Also, we provide an Auto Problem Generator for robustness testing and educational applications.Our code and data are publicly available at https://github.com/cyzhh/MMOS.


Goal-Oriented Prompt Attack and Safety Evaluation for LLMs

Liu, Chengyuan, Zhao, Fubang, Qing, Lizhi, Kang, Yangyang, Sun, Changlong, Kuang, Kun, Wu, Fei

arXiv.org Artificial Intelligence

Large Language Models (LLMs) presents significant priority in text understanding and generation. However, LLMs suffer from the risk of generating harmful contents especially while being employed to applications. There are several black-box attack methods, such as Prompt Attack, which can change the behaviour of LLMs and induce LLMs to generate unexpected answers with harmful contents. Researchers are interested in Prompt Attack and Defense with LLMs, while there is no publicly available dataset with high successful attacking rate to evaluate the abilities of defending prompt attack. In this paper, we introduce a pipeline to construct high-quality prompt attack samples, along with a Chinese prompt attack dataset called CPAD. Our prompts aim to induce LLMs to generate unexpected outputs with several carefully designed prompt attack templates and widely concerned attacking contents. Different from previous datasets involving safety estimation, we construct the prompts considering three dimensions: contents, attacking methods and goals. Especially, the attacking goals indicate the behaviour expected after successfully attacking the LLMs, thus the responses can be easily evaluated and analysed. We run several popular Chinese LLMs on our dataset, and the results show that our prompts are significantly harmful to LLMs, with around 70% attack success rate to GPT-3.5. CPAD is publicly available at https://github.com/liuchengyuan123/CPAD.


Onsite Job Scheduling by Adaptive Genetic Algorithm

Basak, Avijit, Acharya, Subhas

arXiv.org Artificial Intelligence

Onsite Job Scheduling is a specialized variant of Vehicle Routing Problem (VRP) with multiple depots. The objective of this problem is to execute jobs requested by customers, belonging to different geographic locations by a limited number of technicians, with minimum travel and overtime of technicians. Each job is expected to be completed within a specified time limit according to the service level agreement with customers. Each technician is assumed to start from a base location, serve several customers and return to the starting place. Technicians are allotted jobs based on their skill sets, expertise levels of each skill and availability slots. Although there are considerable number of literatures on VRP we do not see any explicit work related to Onsite Job Scheduling. In this paper we have proposed an Adaptive Genetic Algorithm to solve the scheduling problem. We found an optimized travel route for a substantial number of jobs and technicians, minimizing travel distance, overtime duration as well as meeting constraints related to SLA.