AITopics | rocksample

Collaborating Authors

rocksample

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Monte-Carlo Tree Search for Constrained POMDPs

Jongmin Lee, Geon-hyeong Kim, Pascal Poupart, Kee-Eung Kim

Neural Information Processing SystemsFeb-12-2026, 20:45:58 GMT

Neural Information Processing Systems http://nips.cc/

cc-pomcp, cost constraint, cpomdp, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Estonia > Tartu County > Tartu (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Collaborative Decision Making Using Action Suggestions

Neural Information Processing SystemsFeb-12-2026, 05:29:19 GMT

Inotherp(ost | st) 1(ost = (st)) where 1 indicator introduce 2 (0,1]. Message Reception Rate Reward Normal Perfect Naive - 1.0 Scaled - 0.99 Noisy - 5.0 Chanceof Random Suggestions Reward Normal Perfect Random Naive - 1.0 Naive - 0.25 Scaled - 0.99 Scaled - 0.25 Noisy - 5.0 Noisy - 1.0 Chanceof R...

artificial intelligence, human computer interaction, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > District of Columbia > Washington (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.97)
Information Technology > Human Computer Interaction (0.83)
Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Appendix

Neural Information Processing SystemsFeb-11-2026, 19:47:24 GMT

According to Alg. 2, in each exploration, at least one leaf node will be expanded. Moreover, the overall size of the belief tree isO((|A|min(Pδmax,Nmax))D), where Nmax is the maximum sample size given by KLD-Sampling,Pδmax = supb,aPδ(Yb,a), and Yb,a is the set of reachable beliefs after executing actiona at belief b. The tree size is limited sinceNmax is finite. The weights are normalized, i.e., There exist bounded functionsα and α0 such that V (b) = R α(s)b(s)ds, and V (b0) = R α0(s)b0(s)ds. Wecan bound the first and third terms, respectively,byλinlight ofthe assumptions.

artificial intelligence, rmax 1, rocksample, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Monte-Carlo Tree Search for Constrained POMDPs

Jongmin Lee, Geon-hyeong Kim, Pascal Poupart, Kee-Eung Kim

Neural Information Processing SystemsNov-20-2025, 16:32:04 GMT

However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural.

artificial intelligence, machine learning, planning & scheduling, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Estonia > Tartu County > Tartu (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making

Asmar, Dylan M., Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceNov-18-2025

Autonomous agents operating in sequential decision-making tasks under uncertainty can benefit from external action suggestions, which provide valuable guidance but inherently vary in reliability. Existing methods for incorporating such advice typically assume static and known suggester quality parameters, limiting practical deployment. We introduce a framework that dynamically learns and adapts to varying suggester reliability in partially observable environments. First, we integrate suggester quality directly into the agent's belief representation, enabling agents to infer and adjust their reliance on suggestions through Bayesian inference over suggester types. Second, we introduce an explicit ``ask'' action allowing agents to strategically request suggestions at critical moments, balancing informational gains against acquisition costs. Experimental evaluation demonstrates robust performance across varying suggester qualities, adaptation to changing reliability, and strategic management of suggestion requests. This work provides a foundation for adaptive human-agent collaboration by addressing suggestion uncertainty in uncertain environments.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.12378

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.32)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)

Add feedback

GammaZero: Learning To Guide POMDP Belief Space Search With Graph Representations

Mangannavar, Rajesh, Tadepalli, Prasad

arXiv.org Artificial IntelligenceOct-17-2025

We introduce an action-centric graph representation framework for learning to guide planning in Partially Observable Markov Decision Processes (POMDPs). Unlike existing approaches that require domain-specific neural architectures and struggle with scalability, GammaZero leverages a unified graph-based belief representation that enables generalization across problem sizes within a domain. Our key insight is that belief states can be systematically transformed into action-centric graphs where structural patterns learned on small problems transfer to larger instances. We employ a graph neural network with a decoder architecture to learn value functions and policies from expert demonstrations on computationally tractable problems, then apply these learned heuristics to guide Monte Carlo tree search on larger problems. Experimental results on standard POMDP benchmarks demonstrate that GammaZero achieves comparable performance to BetaZero when trained and tested on the same-sized problems, while uniquely enabling zero-shot generalization to problems 2-4 times larger than those seen during training, maintaining solution quality with reduced search requirements. Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential decision-making under uncertainty, where agents must act based on incomplete information about the true state of the environment Kaelbling et al. (1998). This partial observability arises naturally in many real-world applications, from autonomous driving where sensors provide limited field-of-view Hoel et al. (2019), to robotic manipulation where object properties must be inferred through interaction Lauri et al. (2022), to subsurface exploration where underground structures can only be observed at sparse drilling locations Mern & Caers (2023).

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2510.14035

Country: North America > United States > Oregon (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Appendix A Different Quality Suggester Results

Neural Information Processing SystemsAug-19-2025, 07:50:40 GMT

This section presents results on RockSample (8, 4, 10, 1) when the suggester is not always all-knowing. In our approach, we formulated the belief update based on assuming the suggester observed the environment. These results demonstrate that our approach extends beyond an all-knowing suggester and can incorporate information from suggestions developed from different beliefs of the state. Table 3 contains the mean rewards and table 4 contains the mean number of suggestions considered by the agent. The details of the agents are provided in section 4.2.

artificial intelligence, initial belief, suggester, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.36)

Add feedback

d85030334fadbd55043c911076caf0ae-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 07:50:37 GMT

artificial intelligence, machine learning, suggestion, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (1.00)
Leisure & Entertainment > Games (0.93)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Add feedback

Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time

Veronese, Celeste, Meli, Daniele, Farinelli, Alessandro

arXiv.org Artificial IntelligenceMay-7-2025

Most popular and effective approaches to online solving Partially Observable Markov Decision Processes (POMDPs, Kaelbling et al. (1998)), e.g., Partially Observable Monte Carlo Planning (POMCP) by Silver and Veness (2010) and Determinized Sparse Partially Observable Tree (DESPOT) by Ye et al. (2017), rely on Monte Carlo Tree Search (MCTS). These approaches are based on online simulations performed in a simulation environment (i.e. a black-box twin of the real POMDP environment) and estimate the value of actions. However, they require domain-specific policy heuristics, suggesting best actions at each state, for efficient exploration. Macro-actions (He et al. (2011); Bertolucci et al. (2021)) are popular policy heuristics that are particularly efficient for long planning horizons. A macro-action is essentially a sequence of suggested actions from a given state that can effectively guide the simulation phase towards actions with high utilities. However, such heuristics are heavily dependent on domain features and are typically handcrafted for each specific domain. Defining these heuristics is an arduous process that requires significant domain knowledge, especially in complex domains. An alternative approach, like the one by Cai and Hsu (2022), is to learn such heuristics via neural networks, which are, however, uninterpretable and data-inefficient. This paper extends the methodology proposed by Meli et al. (2024) to the learning, via Inductive Logic Programming (ILP, Muggleton (1991)), of Event Calculus (EC) theories C. Veronese, D. Meli & A. Farinelli.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.03668

Country: Europe > Italy (0.04)

Genre: Research Report (0.64)

Industry:

Transportation (0.34)
Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Allen, Cameron, Kirtland, Aaron, Tao, Ruo Yu, Lobel, Sam, Scott, Daniel, Petrocelli, Nicholas, Gottesman, Omer, Parr, Ronald, Littman, Michael L., Konidaris, George

arXiv.org Machine LearningJul-9-2024

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to--or knowledge of--an underlying, unobservable state space. Our metric, the $\lambda$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($\lambda$) with a different value of $\lambda$. Since TD($\lambda$=0) makes an implicit Markov assumption and TD($\lambda$=1) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $\lambda$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $\lambda$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $\lambda$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.

agent, algorithm, pomdp, (16 more...)

arXiv.org Machine Learning

2407.07333

Country: