Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models