AITopics | active action

Collaborating Authors

active action

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Individual Regret in Cooperative Stochastic Multi-Armed Bandits

Neural Information Processing SystemsJun-18-2026, 12:33:41 GMT

We study the regret in stochastic Multi-Armed Bandits (MAB) with multiple agents that communicate over an arbitrary connected communication graph. We analyzed a variant of Cooperative Successive Elimination algorithm, Coop-SE, and show an individual regret bound of O(R/m+A2 +A logT) and a nearly matching lower bound. Here Ais the number of actions, T the time horizon, mthe number of agents, and R= P i>0 log(T)/ i is the optimal single agent regret, where i is the sub-optimality gap of action i. Our work is the first to show an individual regret bound in cooperative stochastic MAB that is independent of the graph's diameter. When considering communication networks there are additional considerations beyond regret, such as message size and number of communication rounds. First, we show that our regret bound holds even if we restrict the messages to be of logarithmic size. Second, for logarithmic number of communication rounds, we obtain a regret bound of O(R/m+AlogT).

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.45)
Education (0.45)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

Restless Multi-armed Bandits under Frequency and Window Constraints for Public Service Inspections

Mao, Yi, Perrault, Andrew

arXiv.org Artificial IntelligenceJan-27-2025

Municipal inspections are an important part of maintaining the quality of goods and services. In this paper, we approach the problem of intelligently scheduling service inspections to maximize their impact, using the case of food establishment inspections in Chicago as a case study. The Chicago Department of Public Health (CDPH) inspects thousands of establishments each year, with a substantial fail rate (over 3,000 failed inspection reports in 2023). To balance the objectives of ensuring adherence to guidelines, minimizing disruption to establishments, and minimizing inspection costs, CDPH assigns each establishment an inspection window every year and guarantees that they will be inspected exactly once during that window. These constraints create a challenge for a restless multi-armed bandit (RMAB) approach, for which there are no existing methods. We develop an extension to Whittle index-based systems for RMABs that can guarantee action window constraints and frequencies, and furthermore can be leveraged to optimize action window assignments themselves. Briefly, we combine MDP reformulation and integer programming-based lookahead to maximize the impact of inspections subject to constraints. A neural network-based supervised learning model is developed to model state transitions of real Chicago establishments using public CDPH inspection records, which demonstrates 10\% AUC improvements compared with directly predicting establishments' failures. Our experiments not only show up to 24\% (in simulation) or 33\% (on real data) reward improvements resulting from our approach but also give insight into the impact of scheduling constraints.

constraint, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.00045

Country:

North America > United States > Illinois > Cook County > Chicago (0.66)
North America > United States > Ohio (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.50)
Health & Medicine > Public Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.88)
Information Technology > Data Science > Data Mining > Big Data (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.66)

Add feedback

Individual Regret in Cooperative Stochastic Multi-Armed Bandits

Barnea, Idan, Lancewicki, Tal, Mansour, Yishay

arXiv.org Machine LearningNov-10-2024

We study the regret in stochastic Multi-Armed Bandits (MAB) with multiple agents that communicate over an arbitrary connected communication graph. We show a near-optimal individual regret bound of $\tilde{O}(\sqrt{AT/m}+A)$, where $A$ is the number of actions, $T$ the time horizon, and $m$ the number of agents. In particular, assuming a sufficient number of agents, we achieve a regret bound of $\tilde{O}(A)$, which is independent of the sub-optimality gaps and the diameter of the communication graph. To the best of our knowledge, our study is the first to show an individual regret bound in cooperative stochastic MAB that is independent of the graph's diameter and applicable to non-fully-connected communication graphs.

active action, agent, individual regret, (14 more...)

arXiv.org Machine Learning

2411.06501

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > New York (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(9 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.86)

Add feedback

Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach

Boehmer, Niclas, Zhao, Yunfan, Xiong, Guojun, Rodriguez-Diaz, Paula, Cibrian, Paola Del Cueto, Ngonzi, Joseph, Boatin, Adeline, Tambe, Milind

arXiv.org Artificial IntelligenceOct-10-2024

Maternal mortality remains a significant global public health challenge. One promising approach to reducing maternal deaths occurring during facility-based childbirth is through early warning systems, which require the consistent monitoring of mothers' vital signs after giving birth. Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring, but their scarcity raises the critical question of how to allocate them most effectively. We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit (RMAB) paradigm. In doing so, we identify and address novel, previously unstudied constraints unique to this domain, which render previous approaches for RMABs unsuitable and significantly increase the complexity of the learning and planning problem. To overcome these challenges, we adopt the popular Proximal Policy Optimization (PPO) algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network. We demonstrate in simulations that our approach outperforms the best heuristic baseline by up to a factor of $4$.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.08377

Country:

Africa > Uganda > Western Region > Mbarara District (0.06)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Public Health > Maternal Health (1.00)
Health & Medicine > Diagnostic Medicine > Vital Signs (0.98)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy

Mittal, Vishesh, Meshram, Rahul, Dev, Deepak, Prakash, Surya

arXiv.org Artificial IntelligenceApr-30-2023

We consider finite state restless multi-armed bandit problem. The decision maker can act on M bandits out of N bandits in each time step. The play of arm (active arm) yields state dependent rewards based on action and when the arm is not played, it also provides rewards based on the state and action. The objective of the decision maker is to maximize the infinite horizon discounted reward. The classical approach to restless bandits is Whittle index policy. In such policy, the M arms with highest indices are played at each time step. Here, one decouples the restless bandits problem by analyzing relaxed constrained restless bandits problem. Then by Lagrangian relaxation problem, one decouples restless bandits problem into N single-armed restless bandit problems. We analyze the single-armed restless bandit. In order to study the Whittle index policy, we show structural results on the single armed bandit model. We define indexability and show indexability in special cases. We propose an alternative approach to verify the indexable criteria for a single armed bandit model using value iteration algorithm. We demonstrate the performance of our algorithm with different examples. We provide insight on condition of indexability of restless bandits using different structural assumptions on transition probability and reward matrices. We also study online rollout policy and discuss the computation complexity of algorithm and compare that with complexity of index computation. Numerical examples illustrate that index policy and rollout policy performs better than myopic policy.

bandit, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.0041

Country:

Asia > India (0.04)
Oceania > New Zealand (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits

Ghosh, Abheek, Nagaraj, Dheeraj, Jain, Manish, Tambe, Milind

arXiv.org Artificial IntelligenceFeb-28-2023

We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality under certain conditions. In this work, we first show that Whittle index policies can fail in simple and practically relevant RMAB settings, even when the RMABs are indexable. We discuss why the optimality guarantees fail and why asymptotic optimality may not translate well to practically relevant planning horizons. We then propose an alternate planning algorithm based on the mean-field method, which can provably and efficiently obtain near-optimal policies with a large number of arms, without the stringent structural assumptions required by the Whittle index policies. This borrows ideas from existing research with some improvements: our approach is hyper-parameter free, and we provide an improved non-asymptotic analysis which has: (a) no requirement for exogenous hyper-parameters and tighter polynomial dependence on known problem parameters; (b) high probability bounds which show that the reward of the policy is reliable; and (c) matching sub-optimality lower bounds for this algorithm with respect to the number of arms, thus demonstrating the tightness of our bounds. Our extensive experimental analysis shows that the mean-field approach matches or outperforms other baselines.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.00112

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > India (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Public Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits

Li, Dexun, Varakantham, Pradeep

arXiv.org Artificial IntelligenceJul-27-2022

Restless Multi-Armed Bandits (RMAB) is an apt model to represent decision-making problems in public health interventions (e.g., tuberculosis, maternal, and child care), anti-poaching planning, sensor monitoring, personalized recommendations and many more. Existing research in RMAB has contributed mechanisms and theoretical results to a wide variety of settings, where the focus is on maximizing expected value. In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value. In the context of public health settings, this would ensure that different people and/or communities are fairly represented while making public health intervention decisions. To achieve this goal, we formally define the fairness constraints in RMAB and provide planning and learning methods to solve RMAB in a fair manner. We demonstrate key theoretical properties of fair RMAB and experimentally demonstrate that our proposed methods handle fairness constraints without sacrificing significantly on solution quality.

algorithm, belief state, fairness constraint, (15 more...)

arXiv.org Artificial Intelligence

2206.03883

Country:

Asia > Singapore (0.04)
Oceania > New Zealand (0.04)
North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Public Health (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback