Token-level Direct Preference Optimization