From Text to Pixel: Advancing Long-Context Understanding in MLLMs