observation function
A Architectures, Hyper-parameters and Algorithms
Our approach, named ORDER, uses a three-step training process. In the next parts of this section, we'll explain the methods, structures, and settings we use in each of After that, we'll talk about how we set up and carried out our experiments. In this section, we'll break down the design of the state encoder, how we decided on the best We used a grid search strategy to find the optimal hyper-parameters for our experiments. This allowed each observation dimension to match up with a state factor. We summarize the training process in Algorithm 1.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Sweden (0.04)
- Europe > Germany (0.04)
- Europe > Belgium (0.04)
From CAD to POMDP: Probabilistic Planning for Robotic Disassembly of End-of-Life Products
Baumgärtner, Jan, Hansjosten, Malte, Hald, David, Hauptmannl, Adrian, Puchta, Alexander, Fleischer, Jürgen
Abstract-- T o support the circular economy, robotic systems must not only assemble new products but also disassemble end-of-life (EOL) ones for reuse, recycling, or safe disposal. Existing approaches to disassembly sequence planning often assume deterministic and fully observable product models, yet real EOL products frequently deviate from their initial designs due to wear, corrosion, or undocumented repairs. We argue that disassembly should therefore be formulated as a Partially Observable Markov Decision Process (POMDP), which naturally captures uncertainty about the product's internal state. We present a mathematical formulation of disassembly as a POMDP, in which hidden variables represent uncertain structural or physical properties. Building on this formulation, we propose a task and motion planning framework that automatically derives specific POMDP models from CAD data, robot capabilities, and inspection results. T o obtain tractable policies, we approximate this formulation with a reinforcement-learning approach that operates on stochastic action outcomes informed by inspection priors, while a Bayesian filter continuously maintains beliefs over latent EOL conditions during execution. Using three products on two robotic systems, we demonstrate that this probabilistic planning framework outperforms deterministic baselines in terms of average disassembly time and variance, generalizes across different robot setups, and successfully adapts to deviations from the CAD model, such as missing or stuck parts. I. INTRODUCTION Modern industrial production still follows a linear model of make-use-dispose, accelerating the depletion of natural resources on our planet.
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Europe > Switzerland (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability
Bovy, Eline M., Probine, Caleb, Suilen, Marnix, Topcu, Ufuk, Jansen, Nils
Multi-environment POMDPs (ME-POMDPs) extend standard POMDPs with discrete model uncertainty. ME-POMDPs represent a finite set of POMDPs that share the same state, action, and observation spaces, but may arbitrarily vary in their transition, observation, and reward models. Such models arise, for instance, when multiple domain experts disagree on how to model a problem. The goal is to find a single policy that is robust against any choice of POMDP within the set, i.e., a policy that maximizes the worst-case reward across all POMDPs. We generalize and expand on existing work in the following way. First, we show that ME-POMDPs can be generalized to POMDPs with sets of initial beliefs, which we call adversarial-belief POMDPs (AB-POMDPs). Second, we show that any arbitrary ME-POMDP can be reduced to a ME-POMDP that only varies in its transition and reward functions or only in its observation and reward functions, while preserving (optimal) policies. We then devise exact and approximate (point-based) algorithms to compute robust policies for AB-POMDPs, and thus ME-POMDPs. We demonstrate that we can compute policies for standard POMDP benchmarks extended to the multi-environment setting.
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
- Europe > Germany (0.04)