Deep Reinforcement Learning from Hierarchical Weak Preference Feedback