AITopics | myopic policy

Collaborating Authors

myopic policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Young Hun Jung, Ambuj Tewari

Neural Information Processing SystemsFeb-11-2026, 19:38:28 GMT

Restless bandit problems are instances of non-stationary multi-armed bandits.

artificial intelligence, bandit, data mining, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence (0.71)
Information Technology > Data Science > Data Mining > Big Data (0.55)

Add feedback

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Young Hun Jung, Ambuj Tewari

Neural Information Processing SystemsOct-2-2025, 11:24:06 GMT

Neural Information Processing Systems http://nips.cc/

bandit, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.65)

Add feedback

Bayesian Graph Traversal

Caballero, William N., Jenkins, Phillip R., Banks, David, Robbins, Matthew

arXiv.org Artificial IntelligenceMar-7-2025

This research considers Bayesian decision-analytic approaches toward the traversal of an uncertain graph. Namely, a traveler progresses over a graph in which rewards are gained upon a node's first visit and costs are incurred for every edge traversal. The traveler knows the graph's adjacency matrix and his starting position but does not know the rewards and costs. The traveler is a Bayesian who encodes his beliefs about these values using a Gaussian process prior and who seeks to maximize his expected utility over these beliefs. Adopting a decision-analytic perspective, we develop sequential decision-making solution strategies for this coupled information-collection and network-routing problem. We show that the problem is NP-Hard and derive properties of the optimal walk. These properties provide heuristics for the traveler's problem that balance exploration and exploitation. We provide a practical case study focused on the use of unmanned aerial systems for public safety and empirically study policy performance in myriad Erdos-Renyi settings.

clairvoyant, node, traveler, (17 more...)

arXiv.org Artificial Intelligence

2503.05963

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
Transportation (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

To Analyze and Regulate Human-in-the-loop Learning for Congestion Games

Li, Hongbo, Duan, Lingjie

arXiv.org Artificial IntelligenceJan-14-2025

In congestion games, selfish users behave myopically to crowd to the shortest paths, and the social planner designs mechanisms to regulate such selfish routing through information or payment incentives. However, such mechanism design requires the knowledge of time-varying traffic conditions and it is the users themselves to learn and report past road experiences to the social planner (e.g., Waze or Google Maps). When congestion games meet mobile crowdsourcing, it is critical to incentivize selfish users to explore non-shortest paths in the best exploitation-exploration trade-off. First, we consider a simple but fundamental parallel routing network with one deterministic path and multiple stochastic paths for users with an average arrival probability $\lambda$. We prove that the current myopic routing policy (widely used in Waze and Google Maps) misses both exploration (when strong hazard belief) and exploitation (when weak hazard belief) as compared to the social optimum. Due to the myopic policy's under-exploration, we prove that the caused price of anarchy (PoA) is larger than $\frac{1}{1-\rho^{\frac{1}{\lambda}}}$, which can be arbitrarily large as discount factor $\rho\rightarrow1$. To mitigate such huge efficiency loss, we propose a novel selective information disclosure (SID) mechanism: we only reveal the latest traffic information to users when they intend to over-explore stochastic paths upon arrival, while hiding such information when they want to under-explore. We prove that our mechanism successfully reduces PoA to be less than~$2$. Besides the parallel routing network, we further extend our mechanism and PoA results to any linear path graphs with multiple intermediate nodes.

myopic policy, optimal policy, travel latency, (15 more...)

arXiv.org Artificial Intelligence

2501.03055

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Singapore (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Services (0.94)
Transportation > Ground > Road (0.54)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
Information Technology > Game Theory (0.91)

Add feedback

Human-in-the-loop Learning for Dynamic Congestion Games

Li, Hongbo, Duan, Lingjie

arXiv.org Artificial IntelligenceApr-23-2024

Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Waze). Yet such platforms simply cater to selfish users' myopic interests to recommend the shortest path, and do not encourage enough users to travel and learn other paths for future others. Prior studies focus on one-shot congestion games without considering users' information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a human-in-the-loop manner. Our analysis shows that the myopic routing policy leads to severe under-exploration of stochastic paths. This results in a price of anarchy (PoA) greater than $2$, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.

artificial intelligence, machine learning, mechanism, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TMC.2024.3391697

2404.15599

Country:

North America > United States (0.67)
Asia > China (0.46)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.87)
Transportation > Ground > Road (0.67)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.93)
(2 more...)

Add feedback

A Trainable Approach to Zero-delay Smoothing Spline Interpolation

Ruiz-Moreno, Emilio, López-Ramos, Luis Miguel, Beferull-Lozano, Baltasar

arXiv.org Artificial IntelligenceAug-20-2023

The task of reconstructing smooth signals from streamed data in the form of signal samples arises in various applications. This work addresses such a task subject to a zero-delay response; that is, the smooth signal must be reconstructed sequentially as soon as a data sample is available and without having access to subsequent data. State-of-the-art approaches solve this problem by interpolating consecutive data samples using splines. Here, each interpolation step yields a piece that ensures a smooth signal reconstruction while minimizing a cost metric, typically a weighted sum between the squared residual and a derivative-based measure of smoothness. As a result, a zero-delay interpolation is achieved in exchange for an almost certainly higher cumulative cost as compared to interpolating all data samples together. This paper presents a novel approach to further reduce this cumulative cost on average. First, we formulate a zero-delay smoothing spline interpolation problem from a sequential decision-making perspective, allowing us to model the future impact of each interpolated piece on the average cumulative cost. Then, an interpolation method is proposed to exploit the temporal dependencies between the streamed data samples. Our method is assisted by a recurrent neural network and accordingly trained to reduce the accumulated cost on average over a set of example data samples collected from the same signal source generating the signal to be reconstructed. Finally, we present extensive experimental results for synthetic and real data showing how our approach outperforms the abovementioned state-of-the-art.

artificial intelligence, machine learning, survey article, (20 more...)

arXiv.org Artificial Intelligence

2203.03776

Country:

Europe > Norway (0.14)
North America > United States (0.14)

Genre:

Research Report > Promising Solution (0.68)
Overview > Innovation (0.54)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Low-Complexity Algorithm for Restless Bandits with Imperfect Observations

Liu, Keqin, Weber, Richard, Wu, Ting, Zhang, Chengzhong

arXiv.org Artificial IntelligenceAug-9-2022

We consider a class of restless bandit problems that finds a broad application area in stochastic optimization, reinforcement learning and operations research. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue. The aim is to maximize the expected discounted sum of returns over the infinite horizon subject to a constraint that only $M$ $(

equation, indexability, monotonically, (16 more...)

arXiv.org Artificial Intelligence

2108.03812

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Regret Bounds for Thompson Sampling in Restless Bandit Problems

Jung, Young Hun, Tewari, Ambuj

arXiv.org Machine LearningMay-29-2019

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where we aim to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an $\tilde{O}(\sqrt{T})$ Bayesian regret bound. Our competitor is flexible enough to represent various benchmarks including the best fixed action policy, the optimal policy, the Whittle index policy, or the myopic policy. We also present empirical results that support our theoretical findings.

bandit, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1905.12673

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Add feedback

A Decision-Theoretic Model of Assistance

Fern, A., Natarajan, S., Judah, K., Tadepalli, P.

Journal of Artificial Intelligence ResearchMay-20-2014

There is a growing interest in intelligent assistants for a variety of applications from sorting email to helping people with disabilities to do their daily chores. In this paper, we formulate the problem of intelligent assistance in a decision-theoretic framework, and present both theoretical and empirical results. We first introduce a class of POMDPs called hidden-goal MDPs (HGMDPs), which formalizes the problem of interactively assisting an agent whose goal is hidden and whose actions are observable. In spite of its restricted nature, we show that optimal action selection for HGMDPs is PSPACE-complete even for deterministic dynamics. We then introduce a more restricted model called helper action MDPs (HAMDPs), which are sufficient for modeling many real-world problems. We show classes of HAMDPs for which efficient algorithms are possible. More interestingly, for general HAMDPs we show that a simple myopic policy achieves a near optimal regret, compared to an oracle assistant that knows the agent's goal. We then introduce more sophisticated versions of this policy for the general case of HGMDPs that we combine with a novel approach for quickly learning about the agent being assisted. We evaluate our approach in two game-like computer environments where human subjects perform tasks, and in a real-world domain of providing assistance during folder navigation in a computer desktop environment. The results show that in all three domains the framework results in an assistant that substantially reduces user effort with only modest computation.

agent, hamdp, hgmdp, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4213

AI Access Foundation

10880

Journal of Artificial Intelligence Research

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > Canada > British Columbia > East Kootenay Region > Fernie (0.04)
North America > United States > Indiana > Monroe County > Bloomington (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

A Computational Decision Theory for Interactive Assistants

Fern, Alan, Tadepalli, Prasad

Neural Information Processing SystemsDec-31-2010

We study several classes of interactive assistants from the points of view of decision theory and computational complexity. We first introduce a class of POMDPs called hidden-goal MDPs (HGMDPs), which formalize the problem of interactively assisting an agent whose goal is hidden and whose actions are observable. In spite of its restricted nature, we show that optimal action selection in finite horizon HGMDPs is PSPACE-complete even in domains with deterministic dynamics. We then introduce a more restricted model called helper action MDPs (HAMDPs), where the assistant's action is accepted by the agent when it is helpful, and can be easily ignored by the agent otherwise. We show classes of HAMDPs that are complete for PSPACE and NP along with a polynomial time class. Furthermore, we show that for general HAMDPs a simple myopic policy achieves a regret, compared to an omniscient assistant, that is bounded by the entropy of the initial goal distribution. A variation of this policy is shown to achieve worst-case regret that is logarithmic in the number of goals for any goal distribution.

agent, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Oregon (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback