AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation