Policy Teaching via Data Poisoning in Learning from Human Preferences