apc
cc4d9cfc45325e460b455a820d5f212c-Supplemental-Conference.pdf
The observations are based onproprioception andonegocentricvision. The observations are based onproprioception, sword position and orientation, holeposition. Notetheseenvironmentsare related tothe domains that havebeen proposed for use inoffline RL benchmarks [Gulcehre etal., 2020]; however,the experiments we perform inthis work require availability ofthe expert policy, so we do not use offline data, but instead train new experts and perform experiments in the very low data regime. The main method we consider isAPC described inSection 3from main paper for offline experts cloning experiments. We can also consider resampling a new action, but we empirically found that cross-entropy worked better, see Appendix(8.6).
Data augmentation for efficient learning from parametric experts
We present a simple, yet effective data-augmentation technique to enable data-efficient learning from parametric experts for reinforcement and imitation learning. We focus on what we call the policy cloning setting, in which we use online or of-fline queries of an expert or expert policy to inform the behavior of a student policy.
Tractable Representation Learning with Probabilistic Circuits
Braun, Steven, Sidheekh, Sahil, Vergari, Antonio, Mundt, Martin, Natarajan, Sriraam, Kersting, Kristian
Probabilistic circuits (PCs) are powerful probabilistic models that enable exact and tractable inference, making them highly suitable for probabilistic reasoning and inference tasks. While dominant in neural networks, representation learning with PCs remains underexplored, with prior approaches relying on external neural embeddings or activation-based encodings. To address this gap, we introduce autoencoding probabilistic circuits (APCs), a novel framework leveraging the tractability of PCs to model probabilistic embeddings explicitly. APCs extend PCs by jointly modeling data and embeddings, obtaining embedding representations through tractable probabilistic inference. The PC encoder allows the framework to natively handle arbitrary missing data and is seamlessly integrated with a neural decoder in a hybrid, end-to-end trainable architecture enabled by differentiable sampling. Our empirical evaluation demonstrates that APCs outperform existing PC-based autoencoding methods in reconstruction quality, generate embeddings competitive with, and exhibit superior robustness in handling missing data compared to neural autoencoders. These results highlight APCs as a powerful and flexible representation learning method that exploits the probabilistic inference capabilities of PCs, showing promising directions for robust inference, out-of-distribution detection, and knowledge distillation.