Goto

Collaborating Authors

 result






Policy Improvement using Language Feedback Models

Neural Information Processing Systems

First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, imitation learning using LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens.


A More related works

Neural Information Processing Systems

In this section, we discuss more related works in addition to those in Section 2. In this section, we provide more details on our experimental settings, in addition to those in Section 4.1. Below we describe other detailed settings of each defense method. Normal training (i.e., "No defense") On CIFAR10 and GTSRB, we train for I-BAU The original I-BAU paper conducted experiments on a relatively small convolutional network. In this section, we provide more experimental results in addition to those in Section 4. C.1 Potential adaptive attack The results are shown in Table 8. Alongside ASR and CA, we also show the mean square error (MSE) of the image reconstruction. Smaller MSE roughly indicates better image reconstruction quality.


_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)

Mandi Zhao

Neural Information Processing Systems

Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and If you ran experiments... (a) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Please refer to both main text and appendix for experiment details. Did you report error bars (e.g., with respect to the random seed after running experiments multiple All adaptation experiments in Procgen and RLBench are run for 3 seeds. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal As stated in section 2, we use RTX A5000 GPUs each with 24GB memory. C2F-ARM algorithm and training framework are built based on the original author's implementation Did you mention the license of the assets?