AITopics | augmented state space

Collaborating Authors

augmented state space

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback

Neural Information Processing SystemsOct-7-2025, 23:54:08 GMT

Delayed feedback from the environment is a particular challenge.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
North America > Canada (0.04)
(6 more...)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback

Neural Information Processing SystemsAug-7-2025, 00:26:35 GMT

Delayed feedback from the environment is a particular challenge.

algorithm, proj, timestep, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
North America > United States (0.14)
Europe > Sweden (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Planning with General Objective Functions: Going Beyond Total Rewards

Neural Information Processing SystemsJan-27-2025, 08:11:21 GMT

Additional Feedback: Post feedback response: I appreciate the author feedback. One item I want to flag, though. The feedback said (of one of the reviews): "We are grateful to the reviewer for providing a comprehensive list of papers on non-Markovian reward". I do not think the list is at all "comprehensive". It represents a number of very relevant and very significant papers, but there are others in this area.

artificial intelligence, augmented state space, general objective function, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.64)

Add feedback

Reinforcement Learning With Reward Machines in Stochastic Games

Hu, Jueming, Gaglione, Jean-Raphael, Wang, Yanze, Xu, Zhe, Topcu, Ufuk, Liu, Yongming

arXiv.org Artificial IntelligenceAug-28-2023

We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the reward functions are non-Markovian. We utilize reward machines to incorporate high-level knowledge of complex tasks. We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent. In QRM-SG, we define the Q-function at a Nash equilibrium in augmented state space. The augmented state space integrates the state of the stochastic game and the state of reward machines. Each agent learns the Q-functions of all agents in the system. We prove that Q-functions learned in QRM-SG converge to the Q-functions at a Nash equilibrium if the stage game at each time step during learning has a global optimum point or a saddle point, and the agents update Q-functions based on the best-response strategy at this point. We use the Lemke-Howson method to derive the best-response strategy given current Q-functions. The three case studies show that QRM-SG can learn the best-response strategies effectively. QRM-SG learns the best-response strategies after around 7500 episodes in Case Study I, 1000 episodes in Case Study II, and 1500 episodes in Case Study III, while baseline methods such as Nash Q-learning and MADDPG fail to converge to the Nash equilibrium in all three case studies.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.17372

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Arizona (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Probabilistic Planning with Risk-Sensitive Criterion

Hou, Ping (New Mexico State University)

AAAI ConferencesMar-6-2015

While probabilistic planning models have been extensively used by AI and Decision Theoretic communities for planning under uncertainty, the objective to minimize the expected cumulative cost is inappropriate for high-stake planning problems. With this motivation in mind, we revisit the Risk-Sensitive criterion (RS-criterion), where the objective is to find a policy that maximizes the probability that the cumulative cost is within some user-defined cost threshold. The overall scope of this research is to develop efficient and scalable algorithms to optimize the RS-criterion in probabilistic planning problems. In our recent paper (Hou, Yeoh, and Varakantham 2014), we formally defined Risk-Sensitive MDPs (RS-MDPs) and introduced new algorithms for RS-MDPs with non-negative costs. Next, my plan is to develop algorithm for RS-MDPs with negative cost cycles and for Risk-Sensitive POMDPs (RS-POMDPs).

Add feedback