Long-form evaluation of model editing