SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins