Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos