AITopics | decision step

Collaborating Authors

decision step

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

024677efb8e4aee2eaeef17b54695bbe-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 10:31:06 GMT

artificial intelligence, machine learning, mujoco environment, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TimeDiscretization-Invariant SafeActionRepetitionforPolicyGradientMethods

Neural Information Processing SystemsFeb-7-2026, 06:52:57 GMT

In reinforcement learning, continuous time is often discretized by a time scale δ, to which the resulting performance is known to be highly sensitive. In this work, we seek tofind aδ-invariantalgorithm for policygradient (PG) methods, which performs well regardless of the value ofδ. We first identify the underlying reasons that cause PG methods to fail asδ 0, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to haveδ-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novelδ-invariant method namedSafe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments byadaptivelyreacting tochanges instates during action repetition.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
Asia > Vietnam > Long An Province (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

Yazdani, Pouria, Rezaali, Arash, Abdoos, Monireh

arXiv.org Artificial IntelligenceDec-5-2025

Traffic congestion is a major and complex challenge for cities worldwide with the rapid growth of urbanization and vehicle ownership. Longer commute times, excessive fuel consumption, and elevated air pollution levels are direct consequences of over-saturated roads. For instance, according to the 2024 INRIX Global Traffic Scorecard, individual commuters in Istanbul, New York City, and Chicago experienced total annual delay of about 105, 102, and 102 hours, respectively, underscoring the magnitude of intersection-driven delays in major metros (INRIX). Within urban networks, signalized intersections are the dominant bottlenecks: the policies implemented at these intersections allocate scarce space-time among competing traffic streams and therefore largely determine corridor-level delay, queues, and emissions. Reinforcement learning (RL) has become a standard practice for adaptive traffic signal control (ATSC), controlling phase selection and timing as a sequential decision problem that optimizes long-horizon objectives such as delay, throughput, and emissions under nonstationary demand (Yau et al., 2017). Deep RL (DRL) extends this by using function approximation to digest rich state representations--from detector queues to trajectories and graph-structured networks--enabling policies that generalize across varying traffic flows and topologies (Zhao et al., 2024). Collectively, this body of work motivates moving beyond single-intersection controllers toward coordinated, network-level solutions and setting the stage for multi-agent formulations.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2512.04653

Country:

North America > United States > New York (0.24)
North America > United States > Illinois > Cook County > Chicago (0.24)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.24)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.24)

Genre: Research Report (0.81)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.34)

Add feedback

Improving planning and MBRL with temporally-extended actions

Chatterjee, Palash, Khardon, Roni

arXiv.org Artificial IntelligenceOct-23-2025

Continuous time systems are often modeled using discrete time dynamics but this requires a small simulation step to maintain accuracy. In turn, this requires a large planning horizon which leads to computationally demanding planning problems and reduced performance. Previous work in model-free reinforcement learning has partially addressed this issue using action repeats where a policy is learned to determine a discrete action duration. Instead we propose to control the continuous decision timescale directly by using temporally-extended actions and letting the planner treat the duration of the action as an additional optimization variable along with the standard action variables. This additional structure has multiple advantages. It speeds up simulation time of trajectories and, importantly, it allows for deep horizon search in terms of primitive actions while using a shallow search depth in the planner. In addition, in the model-based reinforcement learning (MBRL) setting, it reduces compounding errors from model learning and improves training time for models. We show that this idea is effective and that the range for action durations can be automatically selected using a multi-armed bandit formulation and integrated into the MBRL framework. An extensive experimental evaluation both in planning and in MBRL, shows that our approach yields faster planning, better solutions, and that it enables solutions to problems that are not solved in the standard formulation.

data mining, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2505.15754

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference

Son, Kihoon, Choi, DaEun, Kim, Tae Soo, Kim, Young-Ho, Yun, Sangdoo, Kim, Juho

arXiv.org Artificial IntelligenceSep-19-2025

Capturing professionals' decision-making in creative workflows is essential for reflection, collaboration, and knowledge sharing, yet existing methods often leave rationales incomplete and implicit decisions hidden. To address this, we present CLEAR framework that structures reasoning into cognitive decision steps-linked units of actions, artifacts, and self-explanations that make decisions traceable. Building on this framework, we introduce ClearFairy, a think-aloud AI assistant for UI design that detects weak explanations, asks lightweight clarifying questions, and infers missing rationales to ease the knowledge-sharing burden. In a study with twelve creative professionals, 85% of ClearFairy's inferred rationales were accepted, increasing strong explanations from 14% to over 83% of decision steps without adding cognitive demand. The captured steps also enhanced generative AI agents in Figma, yielding next-action predictions better aligned with professionals and producing more coherent design outcomes. For future research on human knowledge-grounded creative AI agents, we release a dataset of captured 417 decision steps.

knowledge management, large language model, machine learning, (24 more...)

arXiv.org Artificial Intelligence

2509.14537

Country:

Europe (1.00)
North America > Canada (0.67)
North America > United States > California (0.67)
North America > United States > New York > New York County > New York City (0.15)

Genre:

Workflow (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)
Personal > Interview (0.45)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.45)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
(6 more...)

Add feedback

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes

Middelhuis, Jeroen, Bukhsh, Zaharah, Adan, Ivo, Dijkman, Remco

arXiv.org Artificial IntelligenceSep-3-2025

Resource allocation plays a critical role in minimizing cycle time and improving the efficiency of business processes. Recently, Deep Reinforcement Learning (DRL) has emerged as a powerful technique to optimize resource allocation policies in business processes. In the DRL framework, an agent learns a policy through interaction with the environment, guided solely by reward signals that indicate the quality of its decisions. However, existing algorithms are not suitable for dynamic environments such as business processes. Furthermore, existing DRL-based methods rely on engineered reward functions that approximate the desired objective, but a misalignment between reward and objective can lead to undesired decisions or suboptimal policies. To address these issues, we propose a rollout-based DRL algorithm and a reward function to optimize the objective directly. Our algorithm iteratively improves the policy by evaluating execution trajectories following different actions. Our reward function directly decomposes the objective function of minimizing the cycle time, such that trial-and-error reward engineering becomes unnecessary. We evaluated our method in six scenarios, for which the optimal policy can be computed, and on a set of increasingly complex, realistically sized process models. The results show that our algorithm can learn the optimal policy for the scenarios and outperform or match the best heuristics on the realistically sized business processes.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2504.1125

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

HALO: Hindsight-Augmented Learning for Online Auto-Bidding

Dong, Pusen, Cao, Chenglong, Zhou, Xinyu, You, Jirong, Xu, Linhe, Xu, Feifan, Yuan, Shuo

arXiv.org Artificial IntelligenceAug-11-2025

Digital advertising platforms operate millisecond-level auctions through Real-Time Bidding (RTB) systems, where advertisers compete for ad impressions through algorithmic bids. This dynamic mechanism enables precise audience targeting but introduces profound operational complexity due to advertiser heterogeneity: budgets and ROI targets span orders of magnitude across advertisers, from individual merchants to multinational brands. This diversity creates a demanding adaptation landscape for Multi-Constraint Bidding (MCB). Traditional auto-bidding solutions fail in this environment due to two critical flaws: 1) severe sample inefficiency, where failed explorations under specific constraints yield no transferable knowledge for new budget-ROI combinations, and 2) limited generalization under constraint shifts, as they ignore physical relationships between constraints and bidding coefficients. To address this, we propose HALO: Hindsight-Augmented Learning for Online Auto-Bidding. HALO introduces a theoretically grounded hindsight mechanism that re-purposes all explorations into training data for arbitrary constraint configuration via trajectory reorientation. Further, it employs B-spline functional representation, enabling continuous, derivative-aware bid mapping across constraint spaces. HALO ensures robust adaptation even when budget/ROI requirements differ drastically from training scenarios. Industrial dataset evaluations demonstrate the superiority of HALO in handling multi-scale constraints, reducing constraint violations while improving GMV .

constraint, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2508.03267

Genre: Research Report (0.82)

Industry: Marketing (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation

Qi, Tianyang, Chen, Shibo, Zhang, Jun

arXiv.org Artificial IntelligenceFeb-27-2025

With the widespread adoption of electric vehicles (EVs), navigating for EV drivers to select a cost-effective charging station has become an important yet challenging issue due to dynamic traffic conditions, fluctuating electricity prices, and potential competition from other EVs. The state-of-the-art deep reinforcement learning (DRL) algorithms for solving this task still require global information about all EVs at the execution stage, which not only increases communication costs but also raises privacy issues among EV drivers. To overcome these drawbacks, we introduce a novel generative model-enhanced multi-agent DRL algorithm that utilizes only the EV's local information while achieving performance comparable to these state-of-the-art algorithms. Specifically, the policy network is implemented on the EV side, and a Conditional Variational Autoencoder-Long Short Term Memory (CVAE-LSTM)-based recommendation model is developed to provide recommendation information. Furthermore, a novel future charging competition encoder is designed to effectively compress global information, enhancing training performance. The multi-gradient descent algorithm (MGDA) is also utilized to adaptively balance the weight between the two parts of the training objective, resulting in a more stable training process. Simulations are conducted based on a practical area in Xi\'an, China. Experimental results show that our proposed algorithm, which relies on local information, outperforms existing local information-based methods and achieves less than 8\% performance loss compared to global information-based methods.

algorithm, global state, information, (12 more...)

arXiv.org Artificial Intelligence

2502.20068

Country:

Asia > China > Shaanxi Province > Xi'an (0.24)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

UEVAVD: A Dataset for Developing UAV's Eye View Active Object Detection

Jiang, Xinhua, Liu, Tianpeng, Liu, Li, Liu, Zhen, Liu, Yongxiang

arXiv.org Artificial IntelligenceNov-6-2024

Occlusion is a longstanding difficulty that challenges the UAV-based object detection. Many works address this problem by adapting the detection model. However, few of them exploit that the UAV could fundamentally improve detection performance by changing its viewpoint. Active Object Detection (AOD) offers an effective way to achieve this purpose. Through Deep Reinforcement Learning (DRL), AOD endows the UAV with the ability of autonomous path planning to search for the observation that is more conducive to target identification. Unfortunately, there exists no available dataset for developing the UAV AOD method. To fill this gap, we released a UAV's eye view active vision dataset named UEVAVD and hope it can facilitate research on the UAV AOD problem. Additionally, we improve the existing DRL-based AOD method by incorporating the inductive bias when learning the state representation. First, due to the partial observability, we use the gated recurrent unit to extract state representations from the observation sequence instead of the single-view observation. Second, we pre-decompose the scene with the Segment Anything Model (SAM) and filter out the irrelevant information with the derived masks. With these practices, the agent could learn an active viewing policy with better generalization capability. The effectiveness of our innovations is validated by the experiments on the UEVAVD dataset. Our dataset will soon be available at https://github.com/Leo000ooo/UEVAVD_dataset.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2411.04348

Country: Asia > China > Hunan Province (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.48)
Transportation > Passenger (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

decision step

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

024677efb8e4aee2eaeef17b54695bbe-Supplemental.pdf

024677efb8e4aee2eaeef17b54695bbe-Paper.pdf

TimeDiscretization-Invariant SafeActionRepetitionforPolicyGradientMethods

Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

Improving planning and MBRL with temporally-extended actions

ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes

HALO: Hindsight-Augmented Learning for Online Auto-Bidding

A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation

UEVAVD: A Dataset for Developing UAV's Eye View Active Object Detection