Behavior Alignment via Reward Function Optimization Dhawal Gupta University of Massachusetts Y ash Chandak