End-to-End Speech Recognition Contextualization with Large Language Models