Jülg, Tobias
LLM-Pack: Intuitive Grocery Handling for Logistics Applications
Blei, Yannik, Krawez, Michael, Jülg, Tobias, Krack, Pierre, Walter, Florian, Burgard, Wolfram
LLM-Pack: Intuitive Grocery Handling for Logistics Applications Y annik Blei 1, Michael Krawez 1, Tobias Jülg 1, Pierre Krack 1, Florian Walter 1 and Wolfram Burgard 1 Abstract -- Robotics and automation are increasingly influential in logistics but remain largely confined to traditional warehouses. In grocery retail, advancements such as cashier-less supermarkets exist, yet customers still manually pick and pack groceries. While there has been a substantial focus in robotics on the bin picking problem, the task of packing objects and groceries has remained largely untouched. However, packing grocery items in the right order is crucial for preventing product damage, e.g., heavy objects should not be placed on top of fragile ones. However, the exact criteria for the right packing order are hard to define, in particular given the huge variety of objects typically found in stores. In this paper, we introduce LLM-Pack, a novel approach for grocery packing. LLM-Pack leverages language and vision foundation models for identifying groceries and generating a packing sequence that mimics human packing strategy. LLM-Pack does not require dedicated training to handle new grocery items and its modularity allows easy upgrades of the underlying foundation models. We extensively evaluate our approach to demonstrate its performance.
Refined Policy Distillation: From VLA Generalists to RL Experts
Jülg, Tobias, Burgard, Wolfram, Walter, Florian
Recent generalist Vision-Language-Action Models (VLAs) can perform a variety of tasks on real robots with remarkable generalization capabilities. However, reported success rates are often not on par with those of expert policies. Moreover, VLAs usually do not work out of the box and often must be fine-tuned as they are sensitive to setup changes. In this work, we present Refined Policy Distillation (RPD), an RL-based policy refinement method that enables the distillation of large generalist models into small, high-performing expert policies. The student policy is guided during the RL exploration by actions of a teacher VLA for increased sample efficiency and faster convergence. Different from previous work that focuses on applying VLAs to real-world experiments, we create fine-tuned versions of Octo and OpenVLA for ManiSkill2 to evaluate RPD in simulation. As our results for different manipulation tasks demonstrate, RPD enables the RL agent to learn expert policies that surpass the teacher's performance in both dense and sparse reward settings. Our approach is even robust to changes in the camera perspective and can generalize to task variations that the underlying VLA cannot solve.