Goto

Collaborating Authors

 receptacle


8c0fabe372177d2aded596be2d3b4544-Paper-Conference.pdf

Neural Information Processing Systems

Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while maintaining comparable visual robustness. We identify PPO as a more effective RL algorithm for VLAs than LLM-derived methods like DPO and GRPO. We also develop a simple recipe for efficient PPO training on VLAs, and demonstrate its practical utility for improving VLA generalization. The project page is at https://rlvla.github.io.


Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs

Neural Information Processing Systems

Despite the impressive generative abilities of black-box large language models (LLMs), their inherent opacity hinders further advancements in capabilities such as reasoning, planning, and personalization. Existing works aim to enhance LLM capabilities via domain-specific adaptation, which require additional training on accessible model parameters, an infeasible option for black-box LLMs. To address this challenge, we introduce Matryoshka Pilot(M-Pilot), a lightweight white-box LLM controller that guides a large-scale black-box LLM generator by decomposing complex tasks into a series of intermediate outputs. Specifically, we consider the black-box LLM as an environment, with M-Pilot serving as a policy to provide intermediate guidance through prompts for driving the black-box LLM. M-Pilot is trained to pivot the outputs of the black-box LLM aligning with preferences during iterative interaction, which enables controllable multi-turn generation and self-improvement in optimizing intermediate guidance. Empirical evaluations on diverse tasks demonstrate that our method effectively enhances the capabilities of black-box LLMs in complex, long-horizon tasks.



Further Details

Neural Information Processing Systems

A.1 Dataset Details The 20 micro-variations of the 5 macro-variations of the scene were created with the rule of swapping at least two furniture pieces and perturbing the positions of a subset of the other furniture pieces. The occurrences of various furniture objects in these 100 micro-variations are illustrated in Figure 1. Several furniture objects such as'Beanbag' and'Chair' occur more frequently with multiple instances in a some scenes while others such as'Table 03' occur less frequently. We also analyze the object categories of all objects in the original 6 'FRL-apartment' space recreations. We map each of the 92 objects to a semantic category and list the counts per semantic category in a histogram in Figure 1. Since these spaces have a large kitchen area, there is a larger ratio of kitchen objects such as'Kitchen utensil' and'Bowl'. Top down views of the 5 'macro variations' of the scenes are shown in Figure 1. These variations are 5 semantically plausible configurations of furniture in the space generated by a 3D artist. Each surface is annotated with a bounding box, enabling procedural placement of objects on the surfaces. For each of these 5 variations, we generate 20 additional variations, giving 105 scene layouts. Objects are procedurally added on furniture and surfaces using the annotated supporting surface and containment volume information provided by ReplicaCAD.


Habitat 2.0: Training Home Assistants to Rearrange their Habitat

Neural Information Processing Systems

We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks.