When can in-context learning generalize out of task distribution?