GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation