AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Effort Allocation for Deadline-Aware Task and Motion Planning: A Metareasoning Approach

Sung, Yoonchang, Shperberg, Shahaf S., Wang, Qi, Stone, Peter

arXiv.org Artificial IntelligenceOct-8-2024

Their approach involved modeling the problem using a set of processes, each dedicated to searching for a plan, akin to representing search nodes on an open list. Each process is characterized by a probabilistic performance profile, modeled by a random variable indicating the probability of successful termination given processing time, as well as a random variable modeling the deadline corresponding to each partial plan, which is only revealed after planning is concluded. The meta-level problem lies in finding an optimal schedule of processing time across all processes that maximizes the probability that any process delivers a plan before its deadline. A simplified version of this problem, known as "simplified allocating planning effort when actions expire," assumes discrete time intervals and has been proven to be NP-hard. However, under the condition of known deadlines, the problem becomes solvable in pseudo-polynomial time through dynamic programming. Later, this line of work was extended to consider interleaved planning and execution, where partial plans can be executed during the search [62, 63]. While this body of work bears relevance to our research, it primarily concentrates on deriving symbolic plans. In contrast, our focus lies in elaborating existing symbolic plans through motion-level reasoning to make them executable for a robot, optimizing the likelihood of meeting a pre-specified deadline.

computation time, plan skeleton, probability, (14 more...)

arXiv.org Artificial Intelligence

2410.05828

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots

Farjadnasab, Milad, Sirouspour, Shahin

arXiv.org Artificial IntelligenceOct-8-2024

Coordinating heterogeneous teams of mobile robots for tasks such as search and rescue is highly challenging. This is due to the complexities of perception, decision making and planning in such environments, with agents' non-synchronous operation, constrained communication, and limited computational resources. This paper presents the Cooperative and Asynchronous Transformer-based Mission Planning (CATMiP) framework, which leverages multi-agent reinforcement learning (MARL) to effectively coordinate agents with heterogeneous sensing, motion, and actuation capabilities. The framework introduces a Class-based Macro-Action Decentralized Partially Observable Markov Decision Process (CMD-POMDP) model to handle asynchronous decision-making among different agent classes via macro-actions. It also extends the Multi-Agent Transformer (MAT) architecture to facilitate distributed, ad hoc communication among the agents. CATMiP easily adapts to mission complexities and communication constraints, and scales to varying environment sizes and team compositions. Simulations demonstrate its scalability and ability to achieve cooperative mission objectives with two classes of explorer and rescuer agents, even under severe communication constraints. The code is available at https://github.com/mylad13/CATMiP.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.06372

Country: North America (0.28)

Genre: Research Report (0.64)

Industry:

Government > Military (0.70)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Reviews: Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

Neural Information Processing SystemsOct-7-2024, 22:26:04 GMT

This paper describes an extension to the PILCO algorithm (Probabilistic Inference and Learning for COntrol, a data-efficient reinforcement algorithm). The proposed algorithm applies a measurement filtering algorithm during the actual experiment and explicitly takes this measurement filtering algorithm into account during the policy learning step, which uses data from the experiment. This is an important practical extension addressing the fact that measurements are often very noisy. My intuitive explanation for this approach is that the proposed approach makes the overall feedback system more "repeatable" (noise is mostly filtered out) and therefore learning is faster (given that the filtering is effective, see last sentence of the conclusion). The paper presents detailed mathematical derivations and strong simulation results that highlight the properties of the proposed algorithm.

algorithm, continuous state-action gaussian-pomdp, data-efficient reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Neural Information Processing SystemsOct-7-2024, 21:32:00 GMT

The paper proposes TSDE, a posterior sampling algorithm for RL in the average reward infinite horizon setting. This algorithm uses dynamic episodes but unlike Lazy-PSRL avoids technical issues by not only terminating an episode when an observation count doubled but also terminating episodes when they become too long. This ensures that the episode length cannot grow faster than linear and ultimately a Bayesian regret bound of O(HS(AT) .5) is shown. Posterior sampling methods typically outperform UCB-type algorithms and therefore a posterior sampling algorithm for non-episodic RL with rigorous regret bounds is desirable. This paper proposes such an algorithm, which is of high interest.

algorithm, learning unknown markov decision process, thompson sampling approach, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: Entropy Rate Estimation for Markov Chains with Large State Space

Neural Information Processing SystemsOct-7-2024, 21:00:17 GMT

The paper proposes an entropy estimate for Markov chains by reduction to optimal entropy estimation for i.i.d samples. Sample complexity analysis is provided for different mixing scenarios with a minimax rate established for a particular rate. The estimator is used to assess the capacity of language models. This is a very clear and well-written paper. I appreciate the efforts done by the authors to summarize the results.

entropy estimate, entropy rate estimation, markov chain, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Reviews: Modeling Dynamic Missingness of Implicit Feedback for Recommendation

Neural Information Processing SystemsOct-7-2024, 18:02:28 GMT

This paper presents H4MF model (HMM MF for dynamic Missingness) for implicit feedback data. With implicit data, we only observe positive feedback and the missing entries (zeros) in the data can indicate either negative feedback or users are not exposed of the items. H4MF is based on the previous work on modeling user latent exposure (ExpoMF, Liang et al., Modeling user exposure in recommendation, 2016) -- the basic idea is that for each user-item pair, there is a latent binary variable to represent exposure; if it's 1, it means this user is exposed to the item thus 0 feedback mean true negative, while if it's 0, it means this user have not yet been exposed to this item yet. The difference in H4MF is that H4MF uses a hidden Markov model to capture the temporal dynamics in the user exposure (user intent in this paper). The basic idea is that whether or not a user is exposed to something can be dependent on some other items he/she has been exposed before.

expomf, exposure, modeling dynamic missingness, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Add feedback

Reviews: Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes

Neural Information Processing SystemsOct-7-2024, 15:22:10 GMT

Summary: This paper presents a new transfer learning approach using Bayesian Neural Network in MDPs. They are building on the existing framework of Hidden Parameter MDPs, and replace the Gaussian process with BNNs, thereby also modeling the joint uncertainty in the latent weights and the state space. Overall, this proposed approach is sound, well developed and seems to help scale the inference. The authors have also shown that it works well by applying it to multiple domains. The paper is extremely well written.

hidden parameter markov decision process, multiple domain, robust and efficient transfer learning, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

Neural Information Processing SystemsOct-7-2024, 15:21:05 GMT

The authors consider distributionally robust finite MDPs over a finite horizon. The transition probabilities conditionally to a state-action pair should remain at L1-bounded distance from a base measure, which is feasible as being generated using a given reference policy. This is a nice idea. A few comments are mentioned next. Related to that question, why the requirement of staying "close" to this policy would be beneficial.

policy-conditioned uncertainty set, reference policy, robust markov decision process, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback

Reviews: Inverse Filtering for Hidden Markov Models

Neural Information Processing SystemsOct-7-2024, 12:04:02 GMT

The paper addresses recovery of the observation sequence given known posterior state estimates, but unknown observations and/or sensor model and also in an extension, noise-corrupted measurements. There is a nice progression of the problem through IP, LP, and MILP followed by a more careful analytical derivation of the answers in the noise-free case, and a seemingly approximate though empirically effective approach (cf. Honestly, most of the motivations seem to be unrealistic, especially the cyber-physical security setting where one does not observe posteriors, but simply an action based on a presumed argmax w.r.t. The EEG application (while somewhat narrow) seems to be the best motivation, however, the sole example is to compare resconstructed observations to a redundant method of sensing -- is this really a compelling application? Is it actually used in practice?

application, hidden markov model, inverse filtering, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Reviews: Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs

Neural Information Processing SystemsOct-7-2024, 11:36:55 GMT

The paper describes a sampling method for learning agent behaviors in interactive POMDPs (I-POMDPs). In general, I-POMDPs are a multi-agent POMDP model which, in addition to a belief about the environment state, the belief space includes nested recursive beliefs about the other agents' models. I-POMDP solutions, including the one proposed in the paper, largely approximate using a finite depth with either intentional models of others (e.g., their nested beliefs, state transitions, optimality criterion, etc.) or subintentional models of others (e.g., essentially "summaries of behavior" such as fictitious play). The proposed approach uses samples of the other agent at a particular depth to compute its values and policy. Related work on an interactive particle filter assumed the full frame was known (b, S, A, Omega, T, R, OC).

i-pomdp, interactive pomdp, subintentional model, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback