Goto

Collaborating Authors

 compositional vector



A Appendix

Neural Information Processing Systems

In appendix, we provide some additional results in Section A.1, more implementation details in To compare the stability of training, we didn't early-stop the training process even if the loss of some tasks already exploded. MTRL training compared with both variants, demonstrating the effectiveness of the PaCo design. MT50 is a more complex benchmark in Meta-World containing 50 different manipulation tasks (including the MT10 tasks). Therefore it's hard to determine if the policy has reached to the optimal. The results are shown in Figure 8.



PaCo: Parameter-Compositional Multi-Task Reinforcement Learning

Sun, Lingfeng, Zhang, Haichao, Xu, Wei, Tomizuka, Masayoshi

arXiv.org Artificial Intelligence

The purpose of multi-task reinforcement learning (MTRL) is to train a single policy that can be applied to a set of different tasks. Sharing parameters allows us to take advantage of the similarities among tasks. However, the gaps between contents and difficulties of different tasks bring us challenges on both which tasks should share the parameters and what parameters should be shared, as well as the optimization challenges due to parameter sharing. In this work, we introduce a parameter-compositional approach (PaCo) as an attempt to address these challenges. In this framework, a policy subspace represented by a set of parameters is learned. Policies for all the single tasks lie in this subspace and can be composed by interpolating with the learned set. It allows not only flexible parameter sharing but also a natural way to improve training. We demonstrate the state-of-the-art performance on Meta-World benchmarks, verifying the effectiveness of the proposed approach.


Estimation for Compositional Data using Measurements from Nonlinear Systems using Artificial Neural Networks

Park, Se Un

arXiv.org Machine Learning

Our objective is to estimate the unknown compositional input from its output response through an unknown system after estimating the inverse of the original system with a training set. The proposed methods using artificial neural networks (ANNs) can compete with the optimal bounds for linear systems, where convex optimization theory applies, and demonstrate promising results for nonlinear system inversions. We performed extensive experiments by designing numerous different types of nonlinear systems. Compositional data is used in many fields because the data in population ratios or fractions is easy to interpret. However, when the compositional data cannot be produced by simple scaling or normalization with the whole population size from the raw data or measurements, the process to produce such compositional outputs may not be straightforward. Here, we consider noisy outputs as our observations from an unknown linear or nonlinear system with the corresponding compositional variable inputs of interest. The pairs of input and outputs will be used as a training set for artificial neural networks (ANN) modeling to estimate the inverse of the unknown system. This trained inverse system can predict the unknown compositional input, given the output measurement coming from the original system with the input. As our approach is based on ANNs, we do not directly estimate the forward observation model, as in the classical inversion theory, but the inverse of the original system. The measurements, the outputs from the original system with the compositional inputs, are then the input of our estimated inverse system, which will predict the original compositional inputs. Se Un Park is with Schlumberger, Houston, TX 77077, USA. Rather, we directly apply non-negativity and scaling layers in the proposed ANNs. We considered both linear observation models and several types of nonlinear models. For the linear cases, where we can theoretically analyze the optimal performance bounds, we demostrated with our experiments that the performance of ANNs for the inversion of the linear model outputs can compete with the optimal bounds. For the nonlinear systems, where convex optimization methods are not well suited for these general cases, we could still present promising results compared to the error levels in the linear models and leave the comparitive analysis with other feasible optimization methods for our future work. O BSERVATIONM ODELS We first define a compositional vector and then present a general observation model. Then, we will formulate more specific observation models. An example of a compositional data or vector includes population ratos, concentration of chemicals in the air, numerous survey statistics in percentage. We define the compositional vector m to be constrained such that its components are nonnegative and sum to unity.