FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

Open in new window