AITopics | efficient resource allocation

Efficient Resources Allocation for Markov Decision Processes

Neural Information Processing SystemsApr-6-2023, 16:52:13 GMT

It is desirable that a complex decision-making problem in an uncer(cid:173) tain world be adequately modeled by a Markov Decision Process (MDP) whose structural representation is adaptively designed by a parsimonious resources allocation process. Resources include time and cost of exploration, amount of memory and computational time allowed for the policy or value function representation. Concerned about making the best use of the available resources, we address the problem of efficiently estimating where adding extra resources is highly needed in order to improve the expected performance of the resulting policy. Possible application in reinforcement learning (RL), when real-world exploration is highly costly, concerns the de(cid:173) tection of those areas of the state-space that need primarily to be explored in order to improve the policy. Another application con(cid:173) cerns approximation of continuous state-space stochastic control problems using adaptive discretization techniques for which highly efficient grid points allocation is mandatory to survive high dimen(cid:173) sionality.

cid, efficient resource allocation, markov decision process, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)

Add feedback

Today Me, Tomorrow Thee: Efficient Resource Allocation in Competitive Settings using Karma Games

Censi, Andrea, Bolognani, Saverio, Zilly, Julian G., Mousavi, Shima Sadat, Frazzoli, Emilio

arXiv.org Artificial IntelligenceJul-22-2019

We present a new type of coordination mechanism among multiple agents for the allocation of a finite resource, such as the allocation of time slots for passing an intersection. We consider the setting where we associate one counter to each agent, which we call karma value, and where there is an established mechanism to decide resource allocation based on agents exchanging karma. The idea is that agents might be inclined to pass on using resources today, in exchange for karma, which will make it easier for them to claim the resource use in the future. To understand whether such a system might work robustly, we only design the protocol and not the agents' policies. We take a game-theoretic perspective and compute policies corresponding to Nash equilibria for the game. We find, surprisingly, that the Nash equilibria for a society of self-interested agents are very close in social welfare to a centralized cooperative solution. These results suggest that many resource allocation problems can have a simple, elegant, and robust solution, assuming the availability of a karma accounting mechanism.

agent, artificial intelligence, game theory, (16 more...)

arXiv.org Artificial Intelligence

1907.09198

Country: Europe (0.28)

Genre: Research Report (0.84)

Industry:

Transportation > Infrastructure & Services (0.68)
Leisure & Entertainment > Games (0.68)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Neural Information Processing SystemsDec-31-2002

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.

contribution, derivative, ylx, (15 more...)

Neural Information Processing Systems

Country: Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Add feedback

Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Neural Information Processing SystemsDec-31-2002

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.

contribution, derivative, ylx, (15 more...)

Neural Information Processing Systems

Country: Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Add feedback

Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Neural Information Processing SystemsDec-31-2002

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.

artificial intelligence, derivative, machine learning, (16 more...)

Neural Information Processing Systems

Technology: