DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?