Algorithms for learning value-aligned policies considering admissibility relaxation