AITopics | decision epoch

Collaborating Authors

decision epoch

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning

Erwan Lecarpentier, Emmanuel Rachelson

Neural Information Processing SystemsFeb-19-2026, 18:11:37 GMT

This work tackles the problem of robust planning in non-stationary stochastic environments.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Add feedback

The Challenger: When Do New Data Sources Justify Switching Machine Learning Models?

Digalakis, Vassilis Jr, Pérignon, Christophe, Saurin, Sébastien, Sentenac, Flore

arXiv.org Machine LearningDec-23-2025

We study the problem of deciding whether, and when an organization should replace a trained incumbent model with a challenger relying on newly available features. We develop a unified economic and statistical framework that links learning-curve dynamics, data-acquisition and retraining costs, and discounting of future gains. First, we characterize the optimal switching time in stylized settings and derive closed-form expressions that quantify how horizon length, learning-curve curvature, and cost differentials shape the optimal decision. Second, we propose three practical algorithms--a one-shot baseline, a greedy sequential method, and a look-ahead sequential method. Using a real-world credit-scoring dataset with gradually arriving alternative data, we show that (i) optimal switching times vary systematically with cost parameters and learning-curve behavior, and (ii) the look-ahead sequential method outperforms other methods and is able to approach in value an oracle with full foresight. Finally, we establish finite-sample guarantees, including conditions under which the sequential look-ahead method achieve sublinear regret relative to that oracle. Our results provide an operational blueprint for economically sound model transitions as new data sources become available.

algorithm, challenger, erignon and saurin and sentenac, (13 more...)

arXiv.org Machine Learning

2512.1839

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Banking & Finance > Loans (1.00)
Health & Medicine (0.92)
Banking & Finance > Credit (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Agentic DDQN-Based Scheduling for Licensed and Unlicensed Band Allocation in Sidelink Networks

Chou, Po-Heng, Fu, Pin-Qi, Saad, Walid, Wang, Li-Chun

arXiv.org Artificial IntelligenceSep-30-2025

Abstract--In this paper, we present an agentic double deep Q-network (DDQN) scheduler for licensed/unlicensed band a l-location in New Radio (NR) sidelink (SL) networks. Beyond conventional reward-seeking reinforcement learning (RL), the agent perceives and reasons over a multi-dimensional conte xt that jointly captures queueing delay, link quality, coexistenc e intensity, and switching stability. A capacity-aware, quality of serv ice (QoS)- constrained reward aligns the agent with goal-oriented sch eduling rather than static thresholding. Under constrained bandwi dth, the proposed design reduces blocking by up to 87.5% versus thres hold policies while preserving throughput, highlighting the va lue of context-driven decisions in coexistence-limited NR SL net works. The proposed scheduler is an embodied agent (E-agent) tailo red for task-specific, resource-efficient operation at the netw ork edge.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2509.06775

Country: Asia > Taiwan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Joint Matching and Pricing for Crowd-shipping with In-store Customers

Dehghan, Arash, Cevik, Mucahit, Bodur, Merve, Ghaddar, Bissan

arXiv.org Artificial IntelligenceJul-3-2025

This paper examines the use of in-store customers as delivery couriers in a centralized crowd-shipping system, targeting the growing need for efficient last-mile delivery in urban areas. We consider a brick-and-mortar retail setting where shoppers are offered compensation to deliver time-sensitive online orders. To manage this process, we propose a Markov Decision Process (MDP) model that captures key uncertainties, including the stochastic arrival of orders and crowd-shippers, and the probabilistic acceptance of delivery offers. Our solution approach integrates Neural Approximate Dynamic Programming (NeurADP) for adaptive order-to-shopper assignment with a Deep Double Q-Network (DDQN) for dynamic pricing. This joint optimization strategy enables multi-drop routing and accounts for offer acceptance uncertainty, aligning more closely with real-world operations. Experimental results demonstrate that the integrated NeurADP + DDQN policy achieves notable improvements in delivery cost efficiency, with up to 6.7\% savings over NeurADP with fixed pricing and approximately 18\% over myopic baselines. We also show that allowing flexible delivery delays and enabling multi-destination routing further reduces operational costs by 8\% and 17\%, respectively. These findings underscore the advantages of dynamic, forward-looking policies in crowd-shipping systems and offer practical guidance for urban logistics operators.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2507.01749

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States (0.04)
North America > Cuba > Holguín Province > Holguín (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Freight & Logistics Services (1.00)
Transportation > Passenger (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Atomic Proximal Policy Optimization for Electric Robo-Taxi Dispatch and Charger Allocation

Dai, Jim, Wu, Manxi, Zhang, Zhanhao

arXiv.org Artificial IntelligenceFeb-18-2025

Pioneering companies such as Waymo have deployed robo-taxi services in several U.S. cities. These robo-taxis are electric vehicles, and their operations require the joint optimization of ride matching, vehicle repositioning, and charging scheduling in a stochastic environment. We model the operations of the ride-hailing system with robo-taxis as a discrete-time, average reward Markov Decision Process with infinite horizon. As the fleet size grows, the dispatching is challenging as the set of system state and the fleet dispatching action set grow exponentially with the number of vehicles. To address this, we introduce a scalable deep reinforcement learning algorithm, called Atomic Proximal Policy Optimization (Atomic-PPO), that reduces the action space using atomic action decomposition. We evaluate our algorithm using real-world NYC for-hire vehicle data and we measure the performance using the long-run average reward achieved by the dispatching policy relative to a fluid-based reward upper bound. Our experiments demonstrate the superior performance of our Atomic-PPO compared to benchmarks. Furthermore, we conduct extensive numerical experiments to analyze the efficient allocation of charging facilities and assess the impact of vehicle range and charger speed on fleet performance.

charger, trip request, vehicle, (15 more...)

arXiv.org Artificial Intelligence

2502.13392

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(2 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Keplinger, Nathaniel S., Luo, Baiting, Bektas, Iliyas, Zhang, Yunuo, Wray, Kyle Hollins, Laszka, Aron, Dubey, Abhishek, Mukhopadhyay, Ayan

arXiv.org Artificial IntelligenceJan-16-2025

In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.

agent, experiment, pamct, (15 more...)

arXiv.org Artificial Intelligence

2501.09646

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Workflow (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Shrinking POMCP: A Framework for Real-Time UAV Search and Rescue

Zhang, Yunuo, Luo, Baiting, Mukhopadhyay, Ayan, Stojcsics, Daniel, Elenius, Daniel, Roy, Anirban, Jha, Susmit, Maroti, Miklos, Koutsoukos, Xenofon, Karsai, Gabor, Dubey, Abhishek

arXiv.org Artificial IntelligenceNov-19-2024

--Efficient path optimization for drones in search and rescue operations faces challenges, including limited visibility, time constraints, and complex information gathering in urban environments. We present a comprehensive approach to optimize UA V-based search and rescue operations in neighborhood areas, utilizing both a 3D AirSim-ROS2 simulator and a 2D simulator . The path planning problem is formulated as a partially observable Markov decision process (POMDP), and we propose a novel "Shrinking POMCP" approach to address time constraints. In the AirSim environment, we integrate our approach with a probabilistic world model for belief maintenance and a neu-rosymbolic navigator for obstacle avoidance. The 2D simulator employs surrogate ROS2 nodes with equivalent functionality. We compare trajectories generated by different approaches in the 2D simulator and evaluate performance across various belief types in the 3D AirSim-ROS simulator . Experimental results from both simulators demonstrate that our proposed shrinking POMCP solution achieves significant improvements in search times compared to alternative methods, showcasing its potential for enhancing the efficiency of UA V-assisted search and rescue operations. Search and rescue (SAR) operations are critical, time-sensitive missions conducted in challenging environments like neighborhoods, wilderness [1], or maritime settings [2]. These resource-intensive operations require efficient path planning and optimal routing [3]. In recent years, Unmanned Aerial V ehicles (UA Vs) have become valuable SAR assets, offering advantages such as rapid deployment, extended flight times, and access to hard-to-reach areas. Equipped with sensors and cameras, UA Vs can detect heat signatures, identify objects, and provide real-time aerial imagery to search teams [4]. However, the use of UA Vs in SAR operations presents unique challenges, particularly in path planning and decision-making under uncertainty. Factors such as limited battery life, changing weather conditions, and incomplete information about the search area complicate the task of efficiently coordinating UA V movements to maximize the probability of locating targets [3].

agent, algorithm, quadrotor, (14 more...)

arXiv.org Artificial Intelligence

2411.12967

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Hungary > Csongrád-Csanád County > Szeged (0.04)

Genre: Research Report (1.00)

Industry:

Transportation (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

Xiong, Guojun, Dinesha, Ujwal, Mukherjee, Debajoy, Li, Jian, Shakkottai, Srinivas

arXiv.org Machine LearningOct-7-2024

Restless multi-armed bandits (RMAB) has been widely used to model constrained sequential decision making problems, where the state of each restless arm evolves according to a Markov chain and each state transition generates a scalar reward. However, the success of RMAB crucially relies on the availability and quality of reward signals. Unfortunately, specifying an exact reward function in practice can be challenging and even infeasible. In this paper, we introduce Pref-RMAB, a new RMAB model in the presence of preference signals, where the decision maker only observes pairwise preference feedback rather than scalar reward from the activated arms at each decision epoch. Preference feedback, however, arguably contains less information than the scalar reward, which makes Pref-RMAB seemingly more difficult. To address this challenge, we present a direct online preference learning (DOPL) algorithm for Pref-RMAB to efficiently explore the unknown environments, adaptively collect preference data in an online manner, and directly leverage the preference feedback for decision-makings. We prove that DOPL yields a sublinear regret. To our best knowledge, this is the first algorithm to ensure $\tilde{\mathcal{O}}(\sqrt{T\ln T})$ regret for RMAB with preference feedback. Experimental results further demonstrate the effectiveness of DOPL.

algorithm, preference feedback, ref -rmab, (16 more...)

arXiv.org Machine Learning

2410.05527

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Oceania > New Zealand (0.04)
North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (0.61)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Budgeted Optimization with Concurrent Stochastic-Duration Experiments

Neural Information Processing SystemsMar-15-2024, 00:48:44 GMT

Budgeted optimization involves optimizing an unknown function that is costly to evaluate by requesting a limited number of function evaluations at intelligently selected inputs. Typical problem formulations assume that experiments are selected one at a time with a limited total number of experiments, which fail to capture important aspects of many real-world problems. This paper defines a novel problem formulation with the following important extensions: 1) allowing for concurrent experiments; 2) allowing for stochastic experiment durations; and 3) placing constraints on both the total number of experiments and the total experimental time. We develop both offline and online algorithms for selecting concurrent experiments in this new setting and provide experimental results on a number of optimization benchmarks. The results show that our algorithms produce highly effective schedules compared to natural baselines.

cpe, experiment, scheduler, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Enhancing Courier Scheduling in Crowdsourced Last-Mile Delivery through Dynamic Shift Extensions: A Deep Reinforcement Learning Approach

Saleh, Zead, Hanbali, Ahmad Al, Baubaid, Ahmad

arXiv.org Artificial IntelligenceFeb-15-2024

Crowdsourced delivery platforms face complex scheduling challenges to match couriers and customer orders. We consider two types of crowdsourced couriers, namely, committed and occasional couriers, each with different compensation schemes. Crowdsourced delivery platforms usually schedule committed courier shifts based on predicted demand. Therefore, platforms may devise an offline schedule for committed couriers before the planning period. However, due to the unpredictability of demand, there are instances where it becomes necessary to make online adjustments to the offline schedule. In this study, we focus on the problem of dynamically adjusting the offline schedule through shift extensions for committed couriers. This problem is modeled as a sequential decision process. The objective is to maximize platform profit by determining the shift extensions of couriers and the assignments of requests to couriers. To solve the model, a Deep Q-Network (DQN) learning approach is developed. Comparing this model with the baseline policy where no extensions are allowed demonstrates the benefits that platforms can gain from allowing shift extensions in terms of reward, reduced lost order costs, and lost requests. Additionally, sensitivity analysis showed that the total extension compensation increases in a nonlinear manner with the arrival rate of requests, and in a linear manner with the arrival rate of occasional couriers. On the compensation sensitivity, the results showed that the normal scenario exhibited the highest average number of shift extensions and, consequently, the fewest average number of lost requests. These findings serve as evidence of the successful learning of such dynamics by the DQN algorithm.

courier, extension, occasional courier, (16 more...)

arXiv.org Artificial Intelligence

2402.09961

Country:

Asia > Middle East > Saudi Arabia > Eastern Province > Dhahran (0.14)
Europe > Netherlands (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Transportation > Freight & Logistics Services (0.93)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback