Human-Centric Foundation Models: Perception, Generation and Agentic Modeling
Tang, Shixiang, Wang, Yizhou, Chen, Lu, Wang, Yuan, Peng, Sida, Xu, Dan, Ouyang, Wanli
–arXiv.org Artificial Intelligence
In this survey, we present community appeals for a unified framework [Ci et al., 2023; a comprehensive overview of HcFMs by proposing Wang et al., 2023; Chen et al., 2024; Huang et al., 2024a] to a taxonomy that categorizes current approaches unlock systematic understanding and a wide range of humancentric into four groups: (1) Human-centric Perception applications for everybody. Foundation Models that capture fine-grained features Inspired by rapid advancements of general foundation models, for multi-modal 2D and 3D understanding; (2) e.g., large language models (LLMs), large vision models Human-centric AIGC Foundation Models that generate (LVMs) and text-to-image generative models, and their high-fidelity, diverse human-related content; presents of a paradigm shift from end-to-end learning of (3) Unified Perception and Generation Models that task-specific models to generalist models, a recent trend is integrate these capabilities to enhance both human to develop Human-centric Foundation Models (HcFM) that understanding and synthesis; and (4) Human-centric satisfy three criteria, namely generalization, broad applicability, Agentic Foundation Models that extend beyond perception and high fidelity. Generalization ensures robustness and generation to learn human-like intelligence to unseen conditions, enabling the model to perform consistently and interactive behaviors for humanoid embodied across varied environments.
arXiv.org Artificial Intelligence
Feb-12-2025