VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation